Trying to understand the Header row concept - 316 / 5

Screen Link:

My Code:

def open_dataset(file_name='AppleStore.csv', row0='str'):
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    if row0:
        return data[1:]
    return data[0:]

apps_data = open_dataset()

The code I used here works, but I want to fully understand why it works and how it differs from the correct answer provided in the lesson.

How would this function determine if there is a header row using only the argument header=True? Isn’t the header row by definition the first row in the dataset? if so, don’t we need to know what kind of data type is actually in the first row to determine if it is indeed a header row (i.e. ‘strings’) vs the first row of data (i.e. ‘int’ or ‘float’)? How does the argument “header=True” determine whether or not the first row contains strings/column labels or just numbers? Every dataset has to have a something in the first row, so how can “header=True” ever identify a dataset without a header row??

I understand that a dataset could ultimately contain any kind of data type, so what makes the Header row unique from all the other rows in the dataset such that I can use a function to identify this uniqueness?

Hi and welcome to the Community!
The objective of this function is to open a .csv file and save it as a list of lists in a variable, and depending on the arguments you pass to the function, it’ll drop the first row, or not, before saving it to the variable.

It’s not supposed to determine whether the dataset has got a header or not, it’s supposed to save the dataset without the header in case you tell to the function that the dataset has got a header. You do it by passing the header argument. So, header=True doesn’t identify a dataset without a header row, by setting the header parameter as True you tell to the function that the header row is present in the dataset.

So, let’s dissect the answer code of the function to demonstrate the mentioned above

  1. Function definition:
def open_dataset(file_name='AppleStore.csv, header=True):

We define a function called open_dataset which takes in 2 parameters, file_name and header, and set some default values to these parameters. Setting the default values means that if you call a function and don’t put any arguments inside the brackets, it will work on the default values which you have set when defining it.

  1. Opening the file
opened_file = open(file_name)
from csv import reader
read_file = reader(opened_file)
data = list(read_file)
  1. Check the header parameter (you either pass its value when calling the function, or the default value is used). If it’s True, then save the dataset dropping the first row, if it’s False then leave the dataset as it is:
    if header:
        return data[1:]
    return data

Due to header being a Boolean type variable, it is not necessary to have the if-statement as:

if header = True:

It just gets its value directly and if it’s True, follows the instructions inside the if-clause.

And coming back to your code, it does the same as the solution’s code but not because that was your intention.
As I guess your intention was to check whether the first row of the dataset is of the string type and if so, drop it when saving the dataset to a variable. What it really does in the if-statement is to check whether row0 is True or False. By your definition row0='str', row0 is a string variable, not Boolean. But Python converts it automatically to Boolean in if row0:. If you call bool() function on row0 with row0 being set as ‘str’, it will return True (to learn why, please, check this docs). After getting that row0 is True the functions follows the instructions inside the if-statement. So, basically, you could have set the default value of row0 as ‘njdjdkg’ and would get the same result.

1 Like

Wow…thanks so much for the detailed reply! I clearly was way over-thinking this one. Your explanation is very helpful!