Error when assigning a variable. " Python for Data Science: Fundamentals" doing "Functions" Intermediate" in the page 6/12

Hello,

This is my first post. I can appreciate feedback in case I am writing in a non effective way.

Right now I am in " Python for Data Science: Fundamentals" doing “Functions” Intermediate" in the page 6/12.

In the instructions, I am required to get the header in one variable and the rest of the data on the other.

I am getting an error as you can see on the image. I don’t understand why. The header should be the index 0 since it’s the first row and the rest of the data should start with the index [1:]

Can anyone tell me why it is wrong?

Kind regards,
Leonel

1 Like

Hi Leonel,

If you read the error message, and can recognize tuple, and list are python types, you will get a sense that somewhere, the type of a variable is wrong.

I can see some points of confusion that made you write all_data[1:] to get the data.
To answer this, you can think about what’s the difference between data inside the function and all_data outside the function. What is the function returning? (verifying not only values, but also types which are even more important). Be extremely thoughtful (both when reading other’s code and writing your own) at the interface from function calling code to the function definition code, and from the return statement inside function definition to the variable(s) assigned to in the function calling code because it’s a major source of mistakes. Even if it works now, still important to keep in mind what’s going on because future changes to the code can break things again.

Different types allow different ways of indexing, getting/setting patterns, and predefined methods for data manipulation. Each type works differently with plotting libraries like matplotlib or manipulation libraries like pandas. You may think you are getting 1 type but you are actually getting another. The code may look the same, the variable name may make perfect sense, but what’s happening under the hood could be totally not what you expect. For a start, print(type(any_variable)) is a good debugging tool.

(Spoiler alert! Following paragraphs contains direct answer to your question, so try to think through first paragraph first and debug yourself.)
To improve your code, you can do multiple assignment (i don’t mean a=b=c), or more clearly called tuple unpacking with header,apps_data = all_data. This horizontal presentation style provides a consistency to the reader with the multiple variables returned inside if header block in the function.

Did you think about which argument to return in the 1st position and why? Also, this question assumes you are using default header=True parameter. What happens when someone passes header=False when calling your function? How does that affect how you assign the return values from calling this function?
If you want to ignore position, you can explore https://docs.python.org/3/library/collections.html#collections.namedtuple to use names to index values in a collection (such as a tuple).

For your from csv import reader it is usually better to collect these importing code at the top of the .py file so the reader immediately knows what tools are used from where.
For unused variables like read_file, directly assigning a = func2(func1(c)) could be more convenient than

b = func1(c) 
a = func2(b)

Nevertheless you may want to name it out for self-documenting code (variable name tells you what it contains), or when you start using pandas where it’s convenient to store out intermediate processed dataframes into new variables so they provide a shorthand reference to be easily used in downstream analysis by multiple functions.

As you get better (coding in your own .ipynb or .py files) you can add type hinting to your code, this can be seen as a faster form of documentation: https://realpython.com/python-type-checking/, because it forces you to specify what type are you taking in and giving out for every function. It also allows IDEs to help you autocomplete when writing classes and allows you to run static type checking tools like mypy.

1 Like

Thank you for your answer. I do understand that inside the function, data is a list, and the returned value is a tuple. However, both of them have index values which should let me access to them

EDIT:

This was the revelation for me. Thank you so much!!! I thought I was returning the whole thing, but I was returning the whole data with the value index 0, and then I returned the heard, which was in the position 1

What happens when you unpack m variables into m>n n variables?

I’m mentally also running into a block for this problem, I also don’t fully understand how the header came to index1 and the rest of the data came into index 0.

if header:
        return data[1:],data[0]

I think I understand here that they data is being divided into two separate indices. I’m assuming that order is important, too? Whereas all header is first being placed into the first dataset, and the remaining data is going into data[0]?

Outside the function, how does the order of data return? This is where I’m confused. If I was basing my understanding where order is prime, then my mind tells me that:

if header:
        return data[1:],data[0]

#in reality
data=[header_line, everything_else]  #header in index0 & everything else in index1

clearly, this is not the case, so It’s safe to say this is NOT happening. My questions about this same topic is, where does this switch? or at what point do I need to focus on? I can’t help but think there is something I’m not understanding in the background, and this could negatively impact my growth.

At one point you mentioned:

Different types allow different ways of indexing, getting/setting patterns, and predefined methods for data manipulation. Each type works differently with plotting libraries like matplotlib or manipulation libraries like pandas

I tried the print(type(any_variable)) debugging tool and came up with two lists. Not sure how this would impact the results. Could you please elaborate on how different types allow different ways of indexing in effecting this problem? or even provide some extra reading on this?

In the function, if you can see, we are returning the data in the position 0, and the header of the data in the position 1. That’s why when we later assign them to the variables, they show like they are inverse.

To fix this problem, inside the function, we have to return the header first and then the data so when we assign it to the variables, they match the order. Please let me know if I’m making myself clear

Not sure i understand your thought process, but i will try to clarify what i think about this screen.
Firstly, if the first row in data is the header, then data[0] definitely gets the header, and data[1:] gets the rest of the data. On the order of the 2 items in the return statement, it doesn’t matter how you order them as long as the output of the function is assigned to variables in the same meaningful order (i’ll call them rest_of_data,header = open_dataset()). If you wanted to switch the order of variables in return, then you should switch the order in the variables you are assigning to too. Think of it as the sender and receiver must be synchronized in order. Something else to consider is in other applications you may be returning unordered objects like dict() or set(), then order doesn’t matter. Sidetracking to some advanced nuggets:

>>> d = {'a': 1, 'b': 2}
>>> locals().update(d)

Here’s how you can make the keys of a dictionary become available variable names in local scope, and the values of a dictionary be pointed to be those keys(now variable names).

This leads to the concept of multiple unpacking: https://treyhunner.com/2018/03/tuple-unpacking-improves-python-code-readability/
He didn’t use functions in that article, but you can apply what you learn’t there here, just imagine he called a function on the right hand side of all assignments. While going through that article, you may need to learn what are * or ** doing. They each serve both packing(technically wrong, but really helps me think about it), and unpacking purposes. https://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/

When i mentioned awareness of types in previous answer, it was to hint that when you do return a,b, and if you don’t unpack them, but place both into all_data as in the screenshot, it’s like assigning a tuple all_data = (a,b) (python automatically generated this).
So leonel was actually indexing into a 2 element tuple when he thought he is indexing into a list of lists by doing header = all_data[0] and apps_data = all_data[1:]. He would have realized something is wrong by doing type(all_data) to see it’s a tuple and not list of lists.
If you are lucky, python will throw IndexError: tuple index out of range, but in this situation there was no error with apps_data = all_data[1:] indexing a tuple because of python’s flexible indexing capabilities, and that both lists and tuples are coincidentally iterables and have positions and are indexable with the same indexing syntax. The first article tells you how to enforce the number of items you want to return through code and help yourself generate warnings when what you expect is not what is happening.

Actually by printing the variable output you could have guessed what’s wrong too, but viewing type(var) makes it explicit. This assumes there is a certain level of familiarity with each type, and it’s something that makes life easier the better you are at it. For example, if you knew boolean was a subtype of int and can be treated as integers, you then know integer operations like mean,sum can be applied to booleans too, so you can easily make visualizations on True/False statistics or do numerical operations on them.

Generally, to learn what you can do with an unknown object, use dir(object) and you can see all the attributes and methods on that object, then you can choose to shift+4x tab to read source code in jupyter or google the documentation to read with nicer UI.

1 Like

I THINK you did explain this and I think I got it. The return of the function is equivalent to:

    if header:
       return a, b
#whereas a=data[1:] or the complete, hard data, b=data[0] or the original header.

In this case the a or the complete data now becomes packaged as index 0, and the b or Applestore.csv header line becomes packaged as index 1.

1 Like

Extra tidbit: variable unpacking can be used for printing too

lstData = [10,20,30,40]

print('The {} are {}, {}, {}, and {}'.format('numbers', *lstData))
print('The {0} are {1}, {4}, {2}, and {3}'.format('numbers', *lstData))

This is one advantage of format strings over f-strings. Second print was to demonstrate a slight twist to my previous statement on

because here you can obviously order the outputs in whatever way you want while leaving lstData as it is. Can think of this as indexing into lstData