Function to drop columns based on fraction of NaN rows

Mission link : https://app.dataquest.io/m/347/working-with-missing-and-duplicate-data/8/handle-missing-values-by-dropping-columns

In case it’s useful

def drop_nulls( df, frac_of_rows ) :
    """ df, float --> df """
    # drop the columns that have more null rows than shape[0]*pc_of_rows
    cols_to_drop = df.isnull().sum()[ df.isnull().sum() > frac_of_rows * df.shape[0] ].index.values
    return df.drop( cols_to_drop , axis=1 )

In the Handle Missing Values by Dropping Columns, they write out a list of cols to drop - which I think goes against the spirit of automation - anything that can be automated should be… If you can express the criteria in words…

Later, I saw that dropna takes a thresh parameter, so you can just set frac_of_rows*df.shape[0] when calling dropna … Use the standard function, not this one :slight_smile:

1 Like

I think its good to provide the Mission link then followed by details. This will help to solve the problem swiftly.

Thanks for understanding

Best
K!

1 Like

Good point. I agree…