422-8 List and For Loops - From Strings to Lists

The wording versus the output in the examples is confusing on From Strings to Lists.

Firstly, can someone explain why the example in the assignment automatically inputs a comma for each \n break?

new_line_split = read_file.split("\n")
print(new_line_split[:5])

Secondly, can someone explain why the output for the code below (from the example), does not split the elements at each line break (\n)? The output in the example seems to show an arbitrary split for each of the elements.

new_line_split = read_file.split("\n")

header = new_line_split[0]
data_row_1 = new_line_split[1]
data_row_2 = new_line_split[2]

first_three = [header, data_row_1, data_row_2]
print(first_three)

print(first_three[0].split(","))
print(first_three[1].split(","))
print(first_three[2].split(","))

Thirdly, I’m not sure why I continue to get an error explicitly stating undefined variable that I did not create. This part is perplexing because there is not variable ever created or referenced called first_three_lists.

error message:
"first_three_lists isn’t defined in your code, but we expected it to be list type"

Hi @ ajtam555, welcome to the community!

I can’t seem to find the example you’ve described here for Python Lists and For Loops – could you post the link to the mission you’re working on for us?

The only thing I think I can answer for now is your third question about the undefined variable. For answer-checking purposes, Dataquest needs you to use specific variable names (which are given in the instructions). It sounds like the answer checker was looking for a variable named first_three_lists, but since you didn’t create it (you used first_three I think), it could not check your answer. You’ll always want to use any specific variable names given in the instructions so it can check your code.

Instructions don’t state a specific variable to use and not sure why it would require you to use a specific variable when we should be able to use any name we want.

All the instruction said was to:

Create a list of lists with three elements where each element is the each row split on the comma character.

But I just realized you could look at the answer and I see. However, why it would look for an arbitrary variable that isn’t explicitly stated to be used is beyond me.

But I’m still trying to wrap my head around why the text file seemingly auto-replaced the line break with a comma?

header = new_line_split[0]
data_row_1 = new_line_split[1]
data_row_2 = new_line_split[2]

And why does the new lists within the list not split each element at the ,?

first_three = [header, data_row_1, data_row_2]
print(first_three)

print(first_three[0].split(","))
print(first_three[1].split(","))
print(first_three[2].split(","))

I’m either retarded or I just didn’t read right, but I didn’t see any explanation for the reasoning for the output.

Thanks.

I can understand how that would be confusing! To be honest, I’m having trouble understanding the question because I can’t find the same mission so that I can put it in context. I’ve been working on the Analyst with Python path and in the List and For Loops mission on screen 8 looks completely different: https://app.dataquest.io/m/312/lists-and-for-loops/8/repetitive-processes. Sorry I wasn’t more help!

@ajtam555 This is imperative when asking questions about the content, the lack of a link led to some miscommunication that ended up resulting in you not having your questions answered earlier.

Apologies! This is my mistake actually, sorry about that. I should have asked to assign the result to first_three_lists.

I’ll try to address your other questions as a reply to the original post.

It doesn’t actually replace \n with comma, but rather it returns a list where each element is defined by splitting the original string at \n.

In the previous screen we could see what the string read_file looks like:

image

When we run read_file.split("\n"), a list is returned in which the first element is the orange portion of the string, the second element is the blue portion, and so on.

There are four print calls in this snippet. I’m guessing you’re asking about the last three.

It won’t split on \n because we passed , as an argument to the str.split method. So it should split on ,. For self-containment purposes, I will include the relevant diagram below.

Already replied in another post. Basically this is a content bug on our end.

Thanks for the detailed response. I do have a follow-up question to your response. I realize we are splitting on the comma ,, but the output in the diagram shows the first list/element including the orange, blue and yellow segments (my assumption was that it would split after vpp_lic.

I’m just not getting the logic on which commas it’s splitting on because there are so many commas (i.e. id, track_name, etc etc).

Oh, I think the Output shown in the exercise example was incorrect. After reviewing the Output with the correct answer, it does show that it splits on the comma (which prior to was \n), but not sure why the Output in the example shows different.

i.e. the first list does show the brackets/list closing after ‘vip_lic’. etc etc

The first output is a consequence of the line print(first_three). The list first_three has three elements:

  • new_line_split[0];
  • new_line_split[1];
  • new_line_split[2];

This what you see if the first part of the output. These elements have yet not been split on the comma, each of them is a string corresponding in the same order to the orange, blue and yellow parts of the first part of the output.

Nope, the output in the example is correct. Here’s a screenshot that shows the result of running that example in the interface:

In case you wish to experiment with this code, for your convenience, here is it:

new_line_split = read_file.split("\n")

header = new_line_split[0]
data_row_1 = new_line_split[1]
data_row_2 = new_line_split[2]

first_three = [header, data_row_1, data_row_2]

print(first_three)

print(first_three[0].split(","))
print(first_three[1].split(","))
print(first_three[2].split(","))

The output in the exercise and in the example are different because they are doing different things.

Let me know what lines of code (and its corresponding output) specifically, are troubling you so that I can try to help.