Doing an if statement for fillna

Screen Link:
https://app.dataquest.io/m/347/working-with-missing-and-duplicate-data/11/handling-missing-values-with-imputation

My Code:

africa = combined[combined['REGION'] == 'Sub-Saharan Africa']
african_mean = africa['HAPPINESS SCORE'].mean()
combined['HAPPINESS SCORE UPDATED'] = combined['HAPPINESS SCORE'].fillna(value = african_mean, inplace = combined['REGION'] == 'Sub-Saharan Africa')

What I expected to happen:
I wanted to modify the problem and replace the null values with the mean value for that region. I understand how to do the problem normally, but I was trying to create an if statement to build out so that the fillna would only fill in if the region was ‘Sub Saharan Africa’. Would I need to build out a full for loop statement? Or would I need to do a pd merge instead?

What actually happened:

Replace this line with the output/error
ValueError: For argument "inplace" expected type bool, received type Series.

<!--Enter other details below: -->

I think you misunderstood what inplace stands for.

There are two ways to update a DataFrame or Series. For an example DataFrame, df, let’s say you wish to replace the Null values with 0. You can do it either -

df = df.fillna(0)

The above replaces df with df.fillna(0).

Or you can assign those values to a new column -

df["new_column"] = df["old_column"].fillna(0)

The second approach is -

df.fillna(0, inplace = True)

Notice how, in the above, we are not doing an assignment operation like we did previously. We don’t do df = something here. That’s what inplace is for.

inplace takes either a value of True (that’s the exact parameter that is used not a condition that is true, like you used) or False. If True, the DataFrame gets updated in its place.

So, you don’t pass a specific condition to inplace that is true. It only takes a True or False.

If you wish to use fillna for only the Sub Saharan Africa region, then you will have to make sure that you are only using those rows from the combined DataFrame. Something like -


combined['HAPPINESS SCORE UPDATED'] = combined[combined['REGION'] == 'Sub-Saharan Africa']['HAPPINESS SCORE'].fillna(value = african_mean)

The combined[combined['REGION'] == 'Sub-Saharan Africa'] will select the part of the DataFrame which satisfies the condition, and then you apply your fillna on the HAPPINESS SCORE column from that part of the DataFrame.

Also, notice how I haven’t included inplace here, because you are assigning the results on the right side of the assignment operator (=) to a particular column of the DataFrame. You are not updating the HAPPINESS SCORE column in its place.

I haven’t tested the above out myself. So, experiment with it, print out some values, and cross-check to be sure.

Thank you! Thought I had tried this type of code before and it didn’t work but I may have typed it incorrectly- regardless thanks for the assistance! Much appreciated

A post was split to a new topic: Should we fill missing values using mean? Will it have any negative consequences?