Unclear solution to code

Hello, Good day.

In this exercise in functions - intermediate [6/12] data scientist path, The first instruction says: “If the data set has a header, the function returns separately both the header and the rest of the data set. Else (if there’s no header), the function returns the entire data set.”

Your solution to the code then goes thus:

if header:
return data[1:], data[0]
else:
return data

This order assumes that header = data[1:], and rest of data = data[0]. This may seem fine but on printing both data[1:] and data[0], this proves to be the direct opposite of what you assigned to each variable. This order then goes on to affect the order of the tuple in the next exercise (exercise 7) where you switch up the arrangement of what variable is assigned for apps_data, and header.

In the next exercise you ask:

  • Do the variable assignment step in a single line of code.
  • Assign the header to a variable named header.
  • Assign the rest of the data set to a variable named apps_data.

and you return the code:
if header:
return data[1:], data[0]
else:
return data
apps_data, header = open_dataset()

I have attached screenshots where I corrected the code and it worked. Also, the third screenshot should shed light on what I am trying to say. I hope this makes sense to whoever is reading.

TL;DR: The order of the returned values doesn’t match the order of the variables in the return statement?

emphasized text

By design of the programming language, multiple unpacking assigns the 1st element of the returned tuple to the first element on the left hand side of the assignment, 2nd element of the returned tuple to the second element on the left hand side.
I don’t see what is wrong with the solution. The author decided to return the header as the 2nd element of the return statement into all_data and then correctly extracted the header by indexing into the 2nd element with header=all_data[1].

I’m guess your real issue is with the unexpected ordering of header in the return statement. Anyone reading the instructions saying the function returns separately both the header and the rest of the data set would order the return as (header,rest of data) to match their order of appearance as given in the instruction.
Why did the author switch it? It may be bad solution design, or getting students to really understand their material and think twice about what’s happening under the hood.

Yes, agreed. But, in the next exercise, the author then uses the variable assigned to header in the previous exercise for apps_data and vice versa . In the next exercise, the author says data[1:] is apps_data, and header is data[0] . This is sort of contradicting what was assigned in the previous exercise, if the author maintained the initial order, it would have been less confusing, and frustrating. The assignment of variables in one exercise affects the other exercise. They should stick to one rule so it doesn’t get confusing is my humble opinion. The third screenshot shows what I am talking about, how the order of assignment has changed and is now confusing.

I looked through all 3 screenshots and still can’t see any inconsistency from the author’s code. Author has consistently assigned the header to the 2nd position in the returned tuple.

There are two variables in these exercises potentially confusing here.

  1. data
  2. all_data

data is where the rest of data and the header is separated. This separation uses the slicing operator data[1:] and an indexer data[0] to do the separation.
After this separation, they are packed into a tuple before being returned, with the header placed in the 2nd position and rest of data in the 1st position.
The output of this function (the aforementioned tuple) is then assigned to all_data, so all_data is now a tuple with 2 elements, the 2nd element containing header. The header is then extracted from the tuple with all_data[1] which references the 2nd position.

Maybe students can be confused by the data[1:] and all_data[1] , and data[0] and all_data[0]. Because both use 1 and 0 to index and the variable name use similar words.
It’s beneficial to think in terms in physical meaning what the variable contains, and which type is it stored in/expressed in the programming language. These 2 anchors will clear most confusions and guide next steps.

1 Like