Functions: Intermediate, Exercise 6/12

Hello there

You see the instructions and the solution attached at the end.

There’s three things I don’t understand.
When it says:

“Assign the header to a variable named header”, I tend to write

header = all_data[0] instead of
header = all_data[1]

and for “assign the rest of the data set to a variable named apps_data”, I would understand to exclude the header and write the rest of the data set

apps_data = all_data[1:] instead of
apps_data = all_data[0]

Where’s my error in reasoning?

In the if-part it says

´´´
if header:
return data[1:], data[0]
´´´
Does it make any differences when I exchange it like this:

´´´
if header:
return data[0], data[1:]
´´´

Thank you very much in advance for your help.

3 Likes

Hi @Horigome ,

This is the solution code:

def open_dataset(file_name='AppleStore.csv', header=True):        
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    
    if header:
        return data[1:], data[0]
    else:
        return data
    
all_data = open_dataset()
header = all_data[1]
apps_data = all_data[0]

Now if you look at the return statement of the if header condition, you will notice that data[1:] is returned first and data[0] is returned second. When we return multiple items at once, those values are returned in a tuple like this: (data[1:], data[0]). As you can see the header is at index position 1 and the rest of the data is at index position 0. This is why we are doing this:

header = all_data[1]
apps_data = all_data[0]

So we if change the order in return statement to:
return data[0], data[1:]

we need to do:

header = all_data[0]
apps_data = all_data[1]

Best,
Sahil

15 Likes

@Sahil,

Aaah of course :wink:

Thank you very much for the fast and clean answer.

Best,
Horigome

2 Likes

@Sahil Ok this makes sense why i got the right answer with it swapped around:

return statement is:
if header:
return data[0],data[1:]
else:
return data

all_data = open_dataset()
header = all_data[0]
apps_data = all_data[1]

1 Like

Wow! Thank you for breaking it down. I was having trouble understand the same situation.

1 Like

WOW thanks!
I got the same problem but now I understand!

1 Like

i enjoyed reading your explanation . thank you so much

Here is my findings ::

open_dataset( file_name=‘AppleStore.csv’, header=True)

here we could see that the fine_name is at the index "0’ and header is at index “1” since its tuple this is not going to change. thats why we have to seek the header at index 1 and not at index 0.

to test lets reverse the index position at the function definition ::

open_dataset(header=True, file_name=‘AppleStore.csv’)

now you can see that - the header will be at “0” and all data will be at “1”
header = all_data[0]
apps_data = all_data[1]

Hi Asinghami,

This makes a whole lot of sense, however, I am still a bit confused with the if statements.
First of all, doesn’t Data[1:] translate to the rest of the rows in the table without the header row? Going by the If statement, if there is header in the opened table, I expected that data[1:] (that is a table without header row ) is returned first and data[0] (only the header row) is returned second as a tuple otherwise, return the complete table. Please kindly correct any misconception you see in this logic. I have been stuck at this for days now because the output is not what I expect neither have I been able to properly understand the logic behind the answer provided. Though, print(all_data) gives the expected output but that cannot be said for print(header) and print(apps_data).

I would appreciate any response.

1 Like

Yes, data[1:] refers to all the rows after the first row which is usually the header row.

Here is where all the confusion comes from: does the data come first or does the header come first? The answer is, it depends on how you defined your open_dataset function and what exactly it returns at the end. For example, if you used:

    if header:
        return data[1:], data[0]
    else:
        return data

then when you open a dataset with header=True, the function will return a tuple with data as the first element and the header as the second element. However, if you used:

    if header:
        return data[0], data[1:]
    else:
        return data

then when you open a dataset with header=True, the function will return a tuple with the header as the first element and data as the second element.

EDIT_1:
What Asinghami is saying is not correct; changing the order of your named parametres when calling open_dataset does not change the order of what is returned. In other words:

open_dataset(file_name='AppleStore.csv', header=True)

and

open_dataset(header=True, file_name='AppleStore.csv')

will produce the same output. The only way to change the order of the objects returned is to change the code within the function definition as mentioned above. As an exercise, try both of these lines of code to see exactly what is returned and in what order. You should see they are identical.

Thank for the clarity, you are right with regards to the position of the def parameters been irrelevant as either way produces same result. Could you throw more light on the statements after the return statements. Why print(header) outputs the header row but print(apps_data) which is same as print (all_data[1]) outputs the rest of the rows instead of only the row after the header? If you could, please kindly create a scenario where the else statement is executed in this case so I could understand better and see from different perspective

You’re welcome and I’m glad you tested out that code to see that order of named parametres does not make a difference in the output. In fact, this is the beauty of using named parametres: you don’t have to remember the order!

Assuming that your function is defined as:

    if header:
        return data[1:], data[0]
    else:
        return data

then print(apps_data) is not the same as print(all_data[1]) because all_data[1] refers to the second element of the returned tuple which is data[0] which is the header.

That said, I think I understand where your confusion is coming from: I think you’re wondering why all_data[0] is returning an object with more than one row because that syntax looks very different than data[1:]. The trick is to remember what each object is and what data it refers to: all_data is a tuple with the first element being a list of lists (this is all the rows after the header/first row) and the second element is just a list (this is the header.) Compare this with data inside our function: it is a list of lists where the first element is the header (also a list.)

To wrap it all up, all_data is a tuple (because it is what’s returned from calling open_dataset()) and the first element of all_data (namely all_data[0]) refers to the first item returned by open_dataset() which is data[1:] (all rows after the header) and the second element of all_data (namely all_data[1]) refers to the second item returned by open_dataset() which is data[0] (the header).

The only way for the else clause to be executed would be when header is False. Since we defined our function open_dataset() to have a default setting of True for header, we would need to call the function and set this parametre to False. Specifically:

all_data = open_dataset(file_name='AppleStore.csv', header=False)

Now things are different! Now, all_data is NOT a tuple…it’s a list of lists where the first row is the header. In other words, all_data[0] would be the header and all_data[1:] would be all rows after the header.

Thank you so much for taking time to break it down! It is all really clear right now.

Thanks for this detailed answer, I understand it better now.

Thanks for the detailed explanation. Do you guys have a link for a youtube video that can explain and demonstrate this at the same time? i’m still getting quite confused that all_data[1] gave an output of the headers only while all_data[0] gave a similar result like data[1:] which have been used a lot on previous examples.

Or maybe i’m still confused with the List vs Tuples?