DataFrame.mask works differently depending on how boolean mask is passed

I am having a problem with two bits of code which I’m pretty sure are functionally identical yet yield different results.

Screen Link:
https://app.dataquest.io/c/64/m/370/working-with-missing-data/3/filling-and-verifying-the-killed-and-injured-data

My Code:

injured_mask  =  injured['total_injured'] != injured_manual_sum
null_injured  = injured['total_injured'].isnull()

injured['total_injured'] = injured['total_injured'].mask(null_injured, injured_manual_sum)
injured['total_injured'] = injured['total_injured'].mask(injured_mask, np.nan)

I saved the boolean masks in two variables for readability, and I expected this to yield the exact same result as the DQ solution, which looks like this:

injured['total_injured'] = injured['total_injured'].mask(injured['total_injured'].isnull(), injured_manual_sum)
injured['total_injured'] = injured['total_injured'].mask(injured['total_injured'] != injured_manual_sum, np.nan)

That is, modulo the assignment, the two versions are identical (unless I’m misreading something.) Except they aren’t, because the values in the total_injured columns differ depending on which approach I use. More specifically, they differ on a single row, row 55699. Using my approach, total_injured reads ‘NaN’ on that row, whereas with the DQ’s approach it reads 1, which is indeed the correct answer. All other rows are identical, regardless of the approach used. (I checked.)

I see no reason why the two approaches should yield different results, let alone on one single, specific case! Any ideas?

In your case, you create injured_mask before making any changes to the total_injured column.

  • You create the injured_mask mask
  • You create the null_injured mask
  • You apply the null_injured mask, modifying total_injured.
  • You apply the injured_mask mask onto the modified total_injured

In the DQ solution, they create injusred_mask after making changes to the total_injured column.

  • They apply the equivalent of the null_injured mask, modifying total_injured.
  • They apply the equivalent of the injured_mask mask onto the modified total_injured.
    • The equivalent of the injured_mask is created from the modified total_injured

I haven’t tested that out, but I would assume that is the reason for the discrepancy.