Super Confused by how split interacts with \n

The split function was just introduced in this lesson so I assumed that \n was a phrase that told the split function to separate the list by creating a new line, but when I saw this line it threw me off. Seems that the line continued through the end of one item in the list and the beginning of another. Is it the split function that caused the exception, the \n, or something else?


Screen Link:
Lists And For Loops — From Strings To Lists | Dataquest

1 Like

Hello ethanauringer

Welcome to the community!!
As explained in the Learn section of the lesson, split("\n") will split the contents of read_file using special character \n, that is wherever there is a \n character, the function will return a new element. In the csv file we have a \n at the end of each row. Hence, each row in the csv file will be an element in the list returned by split("\n").
In the output the elements are as follows:
image

Hope it’s clear now?
Thanks.

3 Likes

@dash.debasmita has given you a clear answer on how split works with '\n'.

As a supplement: the default way print displays a list can be confusing especially when each element in the list is long. In this case, using a for loop can make things clearer:

line_number = 1;
for line in new_line_split:
    # print line number
    print("Line {}".format(line_number))
    # print the line
    print(line)
    # add a separator to distinguish between lines
    print("-"*50)
    # increment list number
    line_number += 1
3 Likes

Hi @ethanauringer,

I had the same questions on this lesson. Look at the start of each line. Each line is wrapped in ’ '. The Clash of Clan line is just printed a little weird but if you run the code you can see each line has 16 items wrapped in ’ ’

Output
[
'id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic', 

'284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1', 

'389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1', 

'529479190,  Clash of Clans,  116476928,  USD,  0.0, 2130805, 579, 4.5, 4.5, 9.24.12, 9+, Games, 38, 5, 18, 1', 

'420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1']

If you print @wanzulfikri for loop example you can clearly see each line.
Reading the above comments from @dash.debasmita also helped to make it clearer:

“As explained in the Learn section of the lesson, split("\n") will split the contents of read_file using special character \n, that is wherever there is a \n character, the function will return a new element”

The read_file is split into a list at the “\n”. Now the read_file is a list and each line or row is a string within that one list.

new_line_split = read_file.split("\n")
# split() returns a list
# each line in the read_file is separated at "\n" where a new line starts

I had to back up and understand each step of the process. I shared my notes - I hope they help.

I am just taking this lesson so please disregard my notes if they are not helpful or confusing!

You can click on the triangle bullets below to read each step of my thought process:

Step 1 Load File
opened_file = open('AppleStore.csv')
read_file = opened_file.read()

print("read_file opens as a string like one text file:","\n", "first 15 characters:","\n",read_file[0:15], "\n","first 5 characters:","\n",read_file[0:5])

read_file opens as a string like one text file:

first 15 characters:
id,track_name,s

first 5 characters:
id,tr

print(type(read_file))
<class ‘str’>

Step 2 Create List of strings
  1. Create a list
new_line_split = read_file.split("\n")
# split() returns a list
# each line in the read_file is separated at "\n" where a new line starts

now the read_file has been converted to a list and each line or row is a string within the list:
[
‘id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic’,

‘284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1’
]

print(type(new_line_split))
<class 'list'>
for row in new_line_split
    print(row)

# unable to subscript the row to reach element within each line or row, prints the entire row
Step 3 Create List of lists

3 Create list of lists

first_three = [new_line_split[0], new_line_split[1], new_line_split[2] ]

first_three_lists = [
first_three[0].split(“,”),
first_three[1].split(“,”),
first_three[2].split(“,”)
]
print(“now the read_file is a list and each line or row is a list, so we have a list of lists:”, “\n”, first_three_lists)

now the read_file is a list and each line or row is a list, so we have a list of lists:

[

[‘id’, ‘track_name’, ‘size_bytes’, ‘currency’, ‘price’, ‘rating_count_tot’, ‘rating_count_ver’, ‘user_rating’, ‘user_rating_ver’, ‘ver’, ‘cont_rating’, ‘prime_genre’, ‘sup_devices.num’, ‘ipadSc_urls.num’, ‘lang.num’, ‘vpp_lic’],

[‘284882215’, ‘Facebook’, ‘389879808’, ‘USD’, ‘0.0’, ‘2974676’, ‘212’, ‘3.5’, ‘3.5’, ‘95.0’, ‘4+’, ‘Social Networking’, ‘37’, ‘1’, ‘29’, ‘1’], [‘389801252’, ‘Instagram’, ‘113954816’, ‘USD’, ‘0.0’, ‘2161558’, ‘1289’, ‘4.5’, ‘4.0’, ‘10.23’, ‘12+’, ‘Photo & Video’, ‘37’, ‘0’, ‘29’, ‘1’]

]

print(type(first_three_lists))
<class 'list'>

for row in first_three_lists:
    print(row)
    print(row[1]) 

 
 # prints the row
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

# prints   element at index 1
Track_name


# Prints the row
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']

# prints element at index 1
Facebook

# now able to subscript the row and return elements within the row or line
2 Likes

Ahh I believe I understand now. So instead of separating elements by a new line, it separates them normally with quotations and commas. But we tell the split function to look for where new lines were created since whoever created the text file separated apps by creating a new line?

1 Like

Yes you are correct. When we save data in database, text file, csv file, etc. we save data in the form of rows, each row represents one entity. In this case, the csv file describes the characters of a app in each row. For the next app, the csv file has the next row.

2 Likes