LIMITED TIME OFFER: 50% OFF OF PREMIUM WITH OUR ANNUAL PLAN (THAT'S $294 IN SAVINGS).
GET OFFER

`.iloc` Method for sorting series

I am working through Learn data science with Python and R projects

My solution and the DQ answer

selected_rows = f500[f500["country"] == "Japan"]
sorted_rows = selected_rows.sort_values("employees", ascending=False)
print(sorted_rows[["company", "country", "employees"]].head())

# My Solution
first_row = sorted_rows.iloc[0]
top_japanese_employer = first_row["company"]

# Answer
top_japanese_employer = sorted_rows.iloc[0]["company"]

I am confused because I don’t recall learning the method for double brackets for .iloc. I checked the documentation (pandas.DataFrame.iloc — pandas 1.3.3 documentation) and it doesn’t appear to be a thing. Did I miss something somewhere?

It’s not so much that "double brackets for .iloc" is a thing in so much that it depends on the situation and what objects you’re working with. In fact, the above two code chunks are equivalent! To be clear: the “first bracket” is associated with .iloc but the “second bracket” is indexing a Series. Therefore, I think you’re right…"double brackets for .iloc" isn’t a thing.

If you look closely, you’ll see that the only difference between your code and that of DQ is that yours is split over two lines. If you were to skip the step of first defining first_row and simply went straight to defining top_japanese_employer in one line, it would be exactly the same as the DQ answer.

So why the “double brackets” here? Well, let’s take it step by step to see if we can logic our way through it:

  • sorted_rows is a DataFrame
  • when we use .loc[0] on this DataFrame, we are returned a Series
  • the returned Series has an index that comes from the column names of the original DataFrame
  • therefore, when we use first_row["company"] , we are returned the value under company for the first_row object (Series)

In fact, since "company" is the first element of first_row (or equivalently, the first element of sorted_rows.iloc[0]) we could actually do this:

top_japanese_employer = sorted_rows.iloc[0][0]

Please tell me I didn’t just blow your mind?! :sunglasses:

In the end, the “first bracket” returns a Series from a DataFrame and the “second bracket” returns a value from that Series based on its index. This is very similar to this:

list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
print(list_of_lists[1][2])

So my question to you is: what value will the code above print out for us and why?

1 Like

Nice. That does make sense. I was trying to index but was I think I got confused with what was being returned (dataframe or series).

Your list_of_lists will print 6: the second list at index position 2. I’m pretty sure. I didn’t peek. If we put a , between [1],[2] it should print both lists. I think.

Woah. Well I got yours right but I’m apparently still confused why adding a , gave me the correct list index but then just returned a new list with a 2 in it. I guess I’d have to go
print(list_of_lists[1], list_of_lists[2])