Trying to understand the header in tuple (Functions: Intermediate:6/12)

Dear Team:

Course link:Learn data science with Python and R projects

I am very confused about the solution of this assignment.

1.I do not understand this part of the solution:
if header:
return data[1:], data[0]

“if header is true, return data[1:]” indicates the data excludes the header row.
What dose the following part “,data[0]” mean?
Following by this code, dose all_data=open() includes the header row?

  1. I consider all_data=open() includes the header row, which is the[0].
    The answer to the question: Assign the header to a variable named header.
    It asks about the header which is the first row of the all_data(AppleStore.csv), so I think the answer should be: header=all_data[0], not as the answer showing “header=all_data[1]”

3.The answer to the question : Assign the rest of the data set to a variable named apps_data.
The consider the “rest of the data set” as “all_data excludes header row”, so the answer should be : “apps_data=all_data[1:]”, not the answer “apps_data=all_data[0]”

I could not figure which part of my logic is wrong. Please advice.
Thank you!

1 Like

This code essentially says if the header argument is True, then return two objects (ie data[1:] and data[0]. If it is False then just return one object, the entire data. This means that when you call your function, it might return one object or it might return two, depending on the value of header in your function call. So for example, if you run your function (open()) with the defaults (ie file_name='AppleStore.csv' and header = True), then it will return two objects. If you assign it to a variable like this:

all_data = open()

then all_data will be a tuple where the first element (ie all_data[0]) is your actual data (ie data[1:]) and the second element (ie all_data[1]) will be your header (ie data[0]).

I agree with your logic here and that’s why when I did this exercise, I wrote my return statement like this:

    if header:
        return data[0], data[1:]
        return data

so that the first element of the returned tuple would be the header and the second element would be the data.

The key here is in understanding how all_data was defined. In the answer provided, all_data is a tuple with the first element as the data and the second element as the header. I (like you) do not enjoy this logic and thus changed the return statement so that it was more logical.


Thank you for your detailed and clear explanation very much!
After I figure out the all_data is a tuple which contains two elements, everything else seems to be reasonable.

Again, Thanks a lot!

I don’t understand why this line is correct:

if header:
return data[1:], data[0]

But this line is incorrect:

if header:
return data[0], data[1:]


After revisiting the material & the above poster, I now understand that the logic of the tuple is: first element as the data and the second element as the header.