In the intermediate functions course, how does 'header' work?

In the examples we have code like this

def open_dataset(file_name=‘AppleStore.csv’, header=True)

I understand the rationale for the code (the file name should be AppleStore and it should have a header) - but I don’t understand how the interpreter knows ‘header’ means ‘this data set has a header row’. Is ‘header’ something built into python that throws back a boolean answer if it is a row of strings? Or am have I misunderstood?

Any help/explanations welcome, thank you!

Hi @Python_newbie:

Usually a dataset contains a header row in the csv file (for the data to be more informative to you). However, when doing data preprocessing, usually the header is removed as they are all of type string and often not needed for your data projects. So its just to let you know what the data in each column had.

header is an optional argument you specify when you require it for processing, but more often than not, it is not used.

Thank you for a very quick response - I still don’t entirely understand I’m afraid. If you look at the code/output below, print(header) creates the list of header titles - but I haven’t created it as a list, or written any commands to populate the list…can you explain what is happening?

def open_dataset(file_name=‘AppleStore.csv’):
opened_file = open(file_name)
from csv import reader
read_file = reader(opened_file)
data = list(read_file)

if header:
    return data[0], data[1:]
else:
    return data

print(header)

Output

[‘id’, ‘track_name’, ‘size_bytes’, ‘currency’, ‘price’, ‘rating_count_tot’, ‘rating_count_ver’, ‘user_rating’, ‘user_rating_ver’, ‘ver’, ‘cont_rating’, ‘prime_genre’, ‘sup_devices.num’, ‘ipadSc_urls.num’, ‘lang.num’, ‘vpp_lic’]

Hi @Python_newbie

data = list(read_file)

This line converts it to a list of list (since there are multiple rows and columns). Since data is a list of lists as described here, data[0] and data[1] are elements of the list, they too are of type list. You can try it out by printing data before the conditions as shown below.

Thus, the header titles are returned as a list after condition checking (as below):

if header:
    return data[0], data[1:]
else:
    return data

It returns a list of comma separated values in a list, similar to how the csv file works.

1 Like

Hi @Python_newbie

If my answer helped you, do you mind marking my reply as the solution?

Thanks