Understanding when to use the dataframe name twice?

Howdy friends. Not sure if I’m wording my question appropriately but essentially what I’m trying to understand is the difference between:

f500.loc[f500[‘previous_rank’] == 0,‘previous_rank’] = np.nan
null_previous_rank = f500[f500[“previous_rank”].isnull()]

compared to:

f500_selection = f500.loc[:,[‘rank’,‘revenues’,‘revenue_change’]].head(5)
top_3_countries = f500[“country”].value_counts().head(3)

I can’t seem to understand how to know when to slice into the dataframe name twice vs once. In the top two examples, f500 is sliced into “two levels” (in bold), but not the bottom examples. Why is this? Can’t the top two codes lines be written similarly to the bottom ones to produce the same result? What am I missing?

Hi aaron,

Any reason why you compare those 2 blocks of code when they serve totally different purposes?
Try to build up the long nested method chain yourself. That will strengthen your understanding of what is the value, datatype, id (for more advanced purposes) of every variable you input and output, and improve your testing skills(i know generating dummy data is tedious, but that gets you really familiar with all the dataframe,dataseries,np.random methods). This requires understanding 1st what is f500[‘previous_rank’], next what is f500 [‘previous_rank’] == 0, then .loc[ f500 [‘previous_rank’] == 0,‘previous_rank’], and finally the full line. Don’t be afraid/lazy to break it down and print to debug, things such as var_name, type(var_name), id(var_name) dir(var_name). The top block contains 2 steps which require referencing the dataframe: 1 generating boolean array for row indexing, and nan assignment. The bottom block does not require boolean indexing for row filtering, it used : in loc to select all rows.

Understanding the input and output types is a vital skill for learning new libraries and classes. dir() shows you all the attributes of the object, in sklearn, dir(model) can tell you what you are allowed to do with the model and what information the object stores and can be accessed. In jupyter, typing model. and pressing tab will show up a dropdown list, which is another quick way of autocomplete when you forget the spelling. You can start by dir(f500)

2 Likes

Thank you for your thorough explanation this is very helpful. I think when processing through the pandas/numpy section I came to a point mentally where the lines began to blur so to speak. Your advice definitely helps me break it down so it makes sense. Thank you!

1 Like