Question inquiry for 'working with missing data'

Screen Link: Learn data science with Python and R projects

My Code:

sup_data = pd.read_csv('supplemental_data.csv')

location_cols = ['location', 'on_street', 'off_street', 'borough']
null_before = mvc[location_cols].isnull().sum()

for c in location_cols:
        mvc[c] = mvc[c].mask(mvc[c].isnull(), sup_data[c])

null_after = mvc[location_cols].isnull().sum()

The question said the mask should represent whether the values is null value or not.
Why should we put mvc[c] = mvc[c].mask(mvc[c].isnull(), sup_data[c])?
I think it should be mvc[c] = mvc[c].mask(mvc[c], sup_data[c])

  1. Loop over the column names in location_cols. In each iteration of the loop, use Series.mask() to replace values in the column in the mvc dataframe:
  • The mask should represent whether the values in column in the mvc has a null value or not.
  • Where the mask is true, the value should be replaced with the equivalent value in sup_data.

Hi @ipngasi

With the above code, how exactly will you identify if the current value at which the loop is being executed, is Null or not?