Guided Project - Markets to advertise in (nonull() vs dropna())

Hey guys, the guided answer uses pd.notnull() as a mask for dropping all null values from a CountryLive. However, I wanted to use pd.dropna() instead but they return different results. Can anyone shed some light in the difference between them?

My Code:

fcc_good['CountryLive'].dropna(inplace = True)
fcc_good.CountryLive.value_counts().head()

What the guided project answer proposes:

# Remove the rows with null values in 'CountryLive'
fcc_good = fcc_good[fcc_good['CountryLive'].notnull()]

# Frequency table to check if we still have enough data
fcc_good['CountryLive'].value_counts().head()

United States of America    2933
India                        463
United Kingdom               279
Canada                       240
Poland                       122
Name: CountryLive, dtype: int64

The answer I got with my code:

United States of America    3125
India                        528
United Kingdom               315
Canada                       260
Poland                       131
Name: CountryLive, dtype: int64

Which is weird, because when I check for current nulls in my code, the result is 0:

fcc_good.CountryLive.isnull().sum()
Out[23]:

0
1 Like

Hi @cordeiropfo,

Welcome to the Community! :star2:

It seems that you forgot to drop null values also from the money_per_month column, as suggested in the guided project instructions. You can do it in this way:

fcc_good['CountryLive'].dropna(inplace = True)
fcc_good['money_per_month'].dropna(inplace = True)

or, better, in one-line code:

fcc_good.dropna(inplace = True, subset = ['CountryLive', 'money_per_month'])

Then you will get the same values as in the solution.

Otherwise, there is no difference here between using your approach (pd.dropna()) or the solution approach (notnull()) - the results will be the same.