Indexing Header and Rest of Data

Screen Link:

https://app.dataquest.io/m/316/functions%3A-intermediate/6/returning-multiple-variables
My Code:

# INITIAL CODE
def open_dataset(file_name='AppleStore.csv', header=True):        
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    
    if header:
        return data[0], data[1:]
    else:
        return data
all_data = open_dataset()
print(all_data[:3])
header = all_data[0]
apps_data = all_data[1:]
print(header[:3])
print(apps_data[:3])

What I expected to happen:
header and first two rows would print for all_data[:3]
([‘id’, ‘track_name’, ‘size_bytes’]) would print for header[:3]
First three data rows after the header would print for apps_data[:3]

What actually happened:

(['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], [['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805',  -this is a partial list. The error were: apps_data is a tuple, but we expected it to be a list. apps_data is shorter than we expected.

Hi @vroomvroom. Your function returns a tuple, with two lists inside:

  • Header
  • Data

So when you call print(all_data[:3]) what do you expect as a result if all_data is a two-element tuple? :blush:

The same reasoning is applied to apps_data = app_data[1:]. Try to have a look at the type of apps_data :blush:

Hope I can help!

Have a nice day:)

1 Like

Thanks for replying. Looking again, I’m confused as to why header=all_data[1] and apps_data=all_data[0]. Why isn’t it header=all_data[0] and apps_data=all_data[1:]?

2 Likes

Why isn’t it header=all_data[0] and apps_data=all_data[1:]?

You have to look at the tuple provided in the solution and you’ll answer your question:)

Both alternatives are correct but it depends on the returned tuple.

Can you please elaborate on how you got to this solution?

if header:
    return data[1:], data[0]

tuple = (data[1:], data[0]), so tuple[1] is header and tuple[0] is all data without header

Is that right?

Exactly, the first element in your tuple is the header, that the second is the data.

1 Like

@vroomvroom I also got confused here. The instructions clearly prompts us to,

  1. Edit the open_dataset() function (already written in the code editor) such that:
    If the data set has a header, the function returns separately both the header and the rest of the data set.

, which I read as the header having to precede the rest of the data set in the if statement.

In the solution, the rest of the data set (data[1:]) however precedes the header (data[0]):

if header:
        return data[1:], data[0]

, which means that when we assign the header and the rest of the data, respectively, to header and apps_data, the solution is given as follows:

header = all_data[1]
apps_data = all_data[0]

My code however works fine when I accurately stick to the initial instruction (albeit, seemingly to me) with the header coming first in the if statement:

if header:
    return data[0], data[1:]
else:
    return data

all_data = open_dataset()
header = all_data[0]
apps_data = all_data[1]

Note the difference in how header and apps_data are assigned at the end. Hope this helps!

4 Likes

Thank you, I think I understand it now!

1 Like

I did the same thing, when I test it out prints data[0] gives the header and data[1:] gives the rest of the file. I think maybe there is an error in the answer otherwise it does not make sense.

1 Like

There is a mistake in the answer and/or in the description of the task

1 Like

You can follow this representation of the code. Then you can retrieve header at index 0 and rest of the data at index 1.

this_representation

Also, this might be of some help to this confusion by @hanqi, Community Moderator.

There are two variables in these exercises potentially confusing here.

  1. data
  2. all_data

data is where the rest of data and the header is separated. This separation uses the slicing operator data[1:] and an indexer data[0] to do the separation.
After this separation, they are packed into a tuple before being returned, with the header placed in the 2nd position and rest of data in the 1st position.
The output of this function (the aforementioned tuple) is then assigned to all_data , so all_data is now a tuple with 2 elements, the 2nd element containing header. The header is then extracted from the tuple with all_data[1] which references the 2nd position.

Maybe students can be confused by the data[1:] and all_data[1] , and data[0] and all_data[0] . Because both use 1 and 0 to index and the variable name use similar words.
It’s beneficial to think in terms in physical meaning what the variable contains, and which type is it stored in/expressed in the programming language. These 2 anchors will clear most confusions and guide next steps.

2 Likes