Fixing Numerical Columns: Alternative to Solution (Guided Project 240)

Mission link - Learn data science with Python and R projects

Hello there,

I have a question concerning the numerical columns and how they are filled with the column’s mode. I tried both of these options:

My Code:

num_missing = df.select_dtypes(include=['int', 'float']).isnull().sum()
fixable_numeric_cols = num_missing[num_missing > 0].sort_values()

df[fixable_numeric_cols.index] = df[fixable_numeric_cols.index].fillna(df[fixable_numeric_cols.index].mode())

and

num_missing = df.select_dtypes(include=['int', 'float']).isnull().sum()
fixable_numeric_cols = num_missing[num_missing > 0].sort_values()

fixable_cols = fixable_numeric_cols.index

for col in fixable_cols:
    df[col] = df[col].fillna(df[col].mode())

What I expected to happen:

df.isnull().sum().value_counts()

0    64
dtype: int64


What actually happened:

df.isnull().sum().value_counts()

0     55
1      6
2      2
23     1
dtype: int64

Any help with this would be greatly appreciated :slight_smile:

Thanks

PS: I cannot add the 240 tag for some reason :confused:

This is a bit of a quirk of Pandas.

When you calculate the mode of a column, it returns a Series with just 1 row and column.

However, fillna() expects either a scalar value (just a single number) or a dict/Series/DataFrame of the same length as the column you are filling.

So, you can’t use the output of mode() directly. You can extract the actual value from it and use that instead -

df[col].fillna(df[col].mode().values[0])

That values[0] will extract just the value of the mode and will fill all the NaNs.

Aaaah that makes sense, thank you!