Screen Link: Learn data science with Python and R projects
My Code:
sup_data = pd.read_csv('supplemental_data.csv')
location_cols = ['location', 'on_street', 'off_street', 'borough']
null_before = mvc[location_cols].isnull().sum()
for c in location_cols:
mvc[c] = mvc[c].mask(mvc[c].isnull(), sup_data[c])
null_after = mvc[location_cols].isnull().sum()
The question said the mask should represent whether the values is null value or not.
Why should we put mvc[c] = mvc[c].mask(mvc[c].isnull(), sup_data[c])?
I think it should be mvc[c] = mvc[c].mask(mvc[c], sup_data[c])
- Loop over the column names in
location_cols
. In each iteration of the loop, useSeries.mask()
to replace values in the column in themvc
dataframe:
- The mask should represent whether the values in column in the
mvc
has a null value or not. - Where the mask is true, the value should be replaced with the equivalent value in
sup_data
.