Why should you replace a 0 with a NAN value?

Screen Link:

My Code:

import numpy as np
prev_rank_before = f500["previous_rank"].value_counts(dropna=False).head()
f500.loc[f500["previous_rank"] == 0, "previous_rank"] = np.nan
prev_rank_after = f500["previous_rank"].value_counts(dropna=False).head()

I have two questions about this assignment:

  1. Why should you replace a 0 value with a Nan?
    When I compare the output from the object: prev_rank_beofre without the .head() method with the prev_rank_before without the .head() method. All the values are the same and occures once. (expect the 0 value). so I don’t really see the added value of replacing 0 with a nan value.
  2. The explanation of this task states the following: ’ Just like in NumPy, np.nan is used in pandas to represent values that can’t be represented numerically, most commonly missing values.’
    why do we import numpy into the command when pandas also has a nan value? Or do I misunderstand something?

Looking forward to your response,
Jeroen

Replacing 0 with nan is so pandas statistics functions like mean/max/min and more can ignore the nan automatically during calculations. If 0 represents missing data rather than non-missing data that is actually 0, leaving 0 inside will mess up summary statistics.

np.nan also has a nice property of returning False when np.nan == np.nan , or True when != , which is exactly opposite of what normal non-nan values return, so this can be used for data manipulation.

np.nan is used by pandas yes, but in the code, see how np.nan appears on right hand side of assignment. It is using np and you must import numpy as np to access this. If you never needed to assign np.nan, then you don’t need to import numpy. Certain pandas operations like joins, can produce nan. Pandas being able to produce/store/display numpy nan is what the explanation is refering to. That’s different from the user explicitly setting cells to nan.

1 Like