Is "header" defined as top row of data file in Python?

In the Functions: Intermediate mission within the Python Fundamentals course, this code graded as correct:

    # INITIAL CODE
def open_dataset(file_name='AppleStore.csv', header='True'):
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)

if header:
    return data[1:]

return data

apps_data = open_dataset()
print(apps_data[:5])

My question is why did using the header parameter in the open_dataset function work? Does Python understand “header” to mean the top row of any file? Is this something I can assume going forward?

# INITIAL CODE
def open_dataset(file_name='AppleStore.csv', header = True):
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)

    if header:
        return data[1:]

    return data

apps_data = open_dataset()
print(apps_data[:5])

Reformatted your code above slightly to fix the indents, for the sake of readability. Keep in mind also that you shouldn’t assign True or False boolean values as strings (i.e. it should be header = True not header = 'True'.

To answer your question, this function in the mission is a custom function we’re writing. Both the file_name and header parameters are therefore also by our own design! Python isn’t assuming anything by itself in the way you meant it.

When we define a function, we can assign our parameters default values to be used when their values aren’t explicitly specified.

The function is defined like so:

def open_dataset(file_name='AppleStore.csv', header=True):

This tells us that the open_dataset function has 2 parameters, and each of those have their own default values. So simply calling the function by itself, i.e. open_dataset(), would cause it to use the respective default values you assigned to the file_name and header parameters. Here, by assigning a default value of True to the header parameter, you are telling Python to assume that the first row will be the header row, because that is how you wrote your function!

As an example, calling your function the following way would cause it not to treat the first row as the header row:

open_dataset(header = False)

Here, since you’re explicitly specifying an argument for the header parameter, the default value of True isn’t used.

1 Like

Thanks for replying @blueberrypudding85. I understand what you are saying, but we still are depending on Python to interpret that the header parameter we are defining refers to just the first row of the data set, right?

Also, thanks for pointing out that the header = 'True' (string) is incorrect and that it should be header = True. The exercise should have given me an error but graded my work as correct. This is just an FYI in case Dataquest wanted to fine tune the grading. Please see screenshot:

Once again, thank you for answering.

1 Like

@swati

The following part of the code takes care of the header row.

if header:
    return data[1:]

if header should be thought of as if header == True. If this evaluates to True, a version of the dataset is returned that excludes the first row (since the 2nd row has an index of 1, data[1:] returns all rows from the 2nd row onwards. This is how the header row, with its index of 0, is excluded.)

Also, I just remembered that there are some strings which evaluate to True or False boolean values, so I stand corrected on that bit about the boolean values! Specifically, any non-empty string will evaluate to a Boolean value of True, so for the purposes of this function, header = 'True' also serves the same purpose. Check out this link for more on that: https://docs.python.org/2/library/stdtypes.html#truth-value-testing

Please forgive me if I am being obtuse. I understand what you are saying in the paragraph below.

if header should be thought of as if header == True . If this evaluates to True, a version of the dataset is returned that excludes the first row (since the 2nd row has an index of 1, data[1:] returns all rows from the 2nd row onwards. This is how the header row, with its index of 0, is excluded.)

However, for Python to evaluate if header == True, it has to have some notion of what a header even is, right? It will only return data[1:] if it has determined there is a header. And for it to do that, it must know what a header is.

Or, am I thinking about this all wrong?

1 Like

Nah it doesn’t have that notion of its own accord! It’s specified by the person using the function! If header isn’t specified to be false, the function is set up to automatically assume it to be true, since header = True is the default value.

Ah. I think I understand it now. Sorry about being so confused by something very small.

Glad I could help!

Again, to clarify, Python isn’t automatically detecting if a row is the header row. It simply assumes that to be the case by default because that’s how the function’s header parameter is written! You could overwrite that by passing in header = False when calling the function, but otherwise it will assume the first row is always the header row.

1 Like