Screen Link:
https://app.dataquest.io/m/240/guided-project%3A-predicting-house-sale-prices/2/feature-engineering
My Code:
DF = DF.fillna(houses.mode)
DF.isnull().sum()
SOLUTION CODE:
num_missing = df.select_dtypes(include=['int', 'float']).isnull().sum()
fixable_numeric_cols = num_missing[(num_missing < len(df)/20) & (num_missing > 0)].sort_values()
replacement_values_dict =
df[fixable_numeric_cols.index].mode().to_dict(orient='records')[0]
df = df.fillna(replacement_values_dict)
I feel as though I achived everything in the solution code with my simpler line of code. Seeing as I have already (prior to this) removed all text columns with missing values from the read-in DF, as well as removed all numeric columns with more than 5% of their values missing. All that is left then should be the numeric cols with 5% or less of their values missing. So potentially, I should be able to fill the missing values with their respective column modes as stipulated in my code above.
There is however a large chance that I have done something wrong or am missing something in my approach. I would appreciate any advice suggestions and maybe any clarity on whether my code is sufficient.
As a side note, I was hoping that someone may be able to explain to me why we sort the values for “fixable_numeric_cols”? I see that this is done a lot and am at a loss why we would need to sort these cols?
All help and comments/suggestions/.honest criticism would be forever appreciated
John