Is all this code necessary?

Screen Link:

My Code:

DF = DF.fillna(houses.mode)



    num_missing = df.select_dtypes(include=['int', 'float']).isnull().sum()
    fixable_numeric_cols = num_missing[(num_missing < len(df)/20) & (num_missing > 0)].sort_values()
    replacement_values_dict = 
    df = df.fillna(replacement_values_dict)

I feel as though I achived everything in the solution code with my simpler line of code. Seeing as I have already (prior to this) removed all text columns with missing values from the read-in DF, as well as removed all numeric columns with more than 5% of their values missing. All that is left then should be the numeric cols with 5% or less of their values missing. So potentially, I should be able to fill the missing values with their respective column modes as stipulated in my code above.

There is however a large chance that I have done something wrong or am missing something in my approach. I would appreciate any advice suggestions and maybe any clarity on whether my code is sufficient.

As a side note, I was hoping that someone may be able to explain to me why we sort the values for “fixable_numeric_cols”? I see that this is done a lot and am at a loss why we would need to sort these cols?

All help and comments/suggestions/.honest criticism would be forever appreciated :slight_smile:


1 Like

Hello @johnedwardferreira5!

Well if you did all this, then you didn’t achieve everything with only one line of code, right? :sweat_smile: Dataquest’s solution is considerably larger than yours because it is doing once. If they had done all the process you described before, then the solution would also be a one-liner:

    df = df.fillna(replacement_values_dict)

About this, they are only sorting the Series. I don’t believe it will raise an error if you don’t do it, but it makes it easier to visualize the fixable_numeric_cols .

Hope this helps you.