Just a little fun with `NaN`!

At the end of my studying time today (ie finished doing new missions), I started to play around with Jupyter notebook to get set up for a guided project that I will focus on tomorrow morning.

I started with a markdown cell, wrote an intro, downloaded the datasets, wrote my import statements, and read the data into variables.

Then I started to poke around inside the data and stumbled across a rabbit hole that made me feel a little like Alice: while exploring NaN values in the dataset, I tried to select the rows of a 'Gender' column that weren’t 'Male' or 'Female' but try as I might, this type of syntax was not getting it done: df['Gender'] == np.nan. Then I remmebered that I should be using built in functions such as df.isnull() or df.isna() to do this type of selection. But I wanted to see if I could select these rows without these methods.

I finally managed to select said rows by doing:

male = df['Gender'] == 'Male'
female = df['Gender'] == 'Female'
nan = df['Gender'][~(male | female)]

I took it a bit further and used .iloc in conjunction with type() in order to confirm that these elements were in fact NaN. So why can’t I do the above using df['Gender'] == np.nan?

That’s when tried this:

np.nan == np.nan
Output: False

This just broke my mathematical brain. Then I remembered an article I read the other day talking about the difference between == and is operators. So I tried:

np.nan is np.nan
Output: True

Oh now it’s on! Now I need to know what’s going on here…I clearly wasn’t going to figure it out on my own so I found this article and I just had to share it:

I hope you enjoy reading it as much as I did! Fascinating things those NaN.


Nice Informative Adventure :slight_smile: :blush: :nerd_face:

Does this mean this would have worked :grey_question: :question: :grey_question: :question: :

nan = df['Gender'][df['Gender'] is np.nan]

I do not believe so since the is operator checks whether the two objects are in fact the same object…not to be confused with == that checks whether the two operands have the same values. Hence, df['Gender'] is np.nan would return False because df['Gender'] is a series object and np.nan is an entirely different creature (I have discovered!)


Read the article, went over my head. I guess I will have to actually encounter this problem to understand what is happening.