Difference between Series.mask() method and the regular assigning values using Boolean mask learnt in the first missions on Pandas

The screen link: https://app.dataquest.io/m/370/working-with-missing-data/3/filling-and-verifying-the-killed-and-injured-data

I´d like to check if I understand correctly the difference between newly introduced method Series.mask() and the regular assigning values using Boolean mask (i.e. df.loc[df['col1'].isnull(), 'col1'] = 'Other') learnt in some Pandas introductory mission.

As far I´ve got it, when we use the latter one only we can update the values only with a single value and when we use the Series.mask() method we can update the values with a single value or a matching value from a series that has identical index labels.

Considering the above true, in this mission we can use either of the methods to replace any numbers from total_killed that aren’t equal to their equivalents in killed_manual_sum with np.nan :
killed['total_killed'] = killed['total_killed'].mask(killed['total_killed'] != killed_manual_sum, np.nan)
killed.loc[killed['total_killed'] != killed_manual_sum, total_killed’] = np.nan`

And we can only use the Series.mask() method in order to replace any null values from the total_killed column with their equivalents from the killed_manual_sum series:
killed['total_killed'] = killed['total_killed'].mask(killed['total_killed'].isnull(), killed_manual_sum)

Could anyone, please, confirm whether I understand it correctly or not?

1 Like

hi @ksenia.kustanovich

Yup. If we have a series (source) of the same length as the series (target) we are trying to mask, the values in the target will get replaced by the values from the source.
Unequal lengths will lead to np.nan as the substitution when the source series is shorter than the target.

In the former code, a single value will replace all those values where the condition is true.