Difference between using mask and using just bracket notations (or .loc)

Screen Link:

I’m curious if anybody could explain to me the difference and advantages of using the Series.mask() over just using the boolean masks with bracket or loc accessors. Is it more efficient or more readable? Or are there applications that using bracket/loc accessors are unable to achieve compared to the Series.mask() method?

I’m confused because I feel like they pretty much achieve the same things. Take the example below:

fruits = pd.Series(["apple", "banana", "banana"])
nums = pd.Series(["one", "two", "three"])
bananamask = (fruits == "banana")

If I wanted to use the bananamask bool to replace values in the fruits series. I could do that using either of the two methods.

# Method 1
fruits = fruits.mask(bananamask, nums)
# Method 2
fruits[bananamask] = nums

The two approaches above achieve the same result. Thus, my question is, can someone give me an example of how the Series.mask() method can be used where the second method (bracket or iloc notation) can’t?

Note: I noticed that there was a similar question posted a while ago on this. The link can be found here. I found the solution or answer still confusing. The explanation was:

If we have a series (source) of the same length as the series (target) we are trying to mask, the values in the target will get replaced by the values from the source.
Unequal lengths will lead to np.nan as the substitution when the source series is shorter than the target.

However, unequal lengths of the target series and the replacement series will result in NaN values regardless if we use the Series.mask() or the brackets/loc accessor method (unless we want to just replace it with a single value). By this, I mean that if we want to replace a target series using another series based on a bool criteria, the lengths of the (1) target series, (2) replacement series, and the (3) boolean mask all have to be equal in order to avoid unintentional NaN values regardless of using method 1 or method 2.

1 Like

Just to expand a little bit on my question. I tried it on the dataframe used for this screen. Basically, we have the following:

Method 1 (used in the screen)

killed['total_killed'] = killed['total_killed'].mask(killed['total_killed'].isnull(), killed_manual_sum)
killed['total_killed'] = killed['total_killed'].mask(killed['total_killed'] != killed_manual_sum, np.nan)

Method 2 (using alternative loc accessor approach)

killed.loc[killed['total_killed'].isnull(), 'total_killed'] = killed_manual_sum
killed.loc[killed['total_killed'] != killed_manual_sum, 'total_killed'] = np.nan

Will the two methods have different results? Why or why not?

In fact, I actually tried Method 2 to answer the screen and DQ told me I had the correct answer: