Screen Link:
I’m curious if anybody could explain to me the difference and advantages of using the Series.mask()
over just using the boolean masks with bracket or loc accessors. Is it more efficient or more readable? Or are there applications that using bracket/loc accessors are unable to achieve compared to the Series.mask()
method?
I’m confused because I feel like they pretty much achieve the same things. Take the example below:
fruits = pd.Series(["apple", "banana", "banana"])
nums = pd.Series(["one", "two", "three"])
bananamask = (fruits == "banana")
If I wanted to use the bananamask
bool to replace values in the fruits
series. I could do that using either of the two methods.
# Method 1
fruits = fruits.mask(bananamask, nums)
# Method 2
fruits[bananamask] = nums
The two approaches above achieve the same result. Thus, my question is, can someone give me an example of how the Series.mask()
method can be used where the second method (bracket or iloc notation) can’t?
Note: I noticed that there was a similar question posted a while ago on this. The link can be found here. I found the solution or answer still confusing. The explanation was:
If we have a series (source) of the same length as the series (target) we are trying to mask, the values in the target will get replaced by the values from the source.
Unequal lengths will lead tonp.nan
as the substitution when the source series is shorter than the target.
However, unequal lengths of the target series and the replacement series will result in NaN
values regardless if we use the Series.mask()
or the brackets/loc accessor method (unless we want to just replace it with a single value). By this, I mean that if we want to replace a target series using another series based on a bool criteria, the lengths of the (1) target series, (2) replacement series, and the (3) boolean mask all have to be equal in order to avoid unintentional NaN
values regardless of using method 1 or method 2.