Unqiue () array into For Loop

Screen Link:
https://app.dataquest.io/m/292/exploring-data-with-pandas%3A-intermediate/11/using-loops-with-pandas

My Code:
top_employer_by_country = {}

countries = f500[‘country’].unique()

for c in countries:
selected_rows=f500[f500[‘country’]==c]
sort_values = selected_rows.sort_values(“employees”,ascending=False).iloc[0]
top_employer = sort_values[“company”]
top_employer_by_country[c] = top_employer

Replace this line with your code

What I expected to happen:

What actually happened:

Replace this line with the output/error
Hi:

so my question is what is really the point of assign country column to unique country array during for loop? I think using f500[‘Country’] is the same as using selected_row. But since is using a loop so c needs to be used.
Thank you very much.

Hmmm…not quite. By creating an array with the unique country names, we get an array with no duplicate country names whereas f500[‘Country’] would give us a column (or array) with many duplicate country names.

The countries array is just a way of setting up an iterable that we can loop over in order to create a mask (f500[‘country’]==c) that allows us to filter our dataframe down to the rows that contain data for each unique country.

If we were to loop over f500[‘Country’] instead of countries, we would end up filtering our data for each country multiple times which is wasteful. We only need to look at the data for each country once in order to determine top_employer.

EDIT_1:

A point of note: we aren’t creating the array of unique country names during the for loop…we must create it before the loop because the loop is being done over that unique array.

1 Like

thanks for the solution, got it now.

1 Like