At the end of my studying time today (ie finished doing new missions), I started to play around with Jupyter notebook to get set up for a guided project that I will focus on tomorrow morning.
I started with a markdown cell, wrote an intro, downloaded the datasets, wrote my import statements, and read the data into variables.
Then I started to poke around inside the data and stumbled across a rabbit hole that made me feel a little like Alice: while exploring NaN
values in the dataset, I tried to select the rows of a 'Gender'
column that weren’t 'Male'
or 'Female'
but try as I might, this type of syntax was not getting it done: df['Gender'] == np.nan
. Then I remmebered that I should be using built in functions such as df.isnull()
or df.isna()
to do this type of selection. But I wanted to see if I could select these rows without these methods.
I finally managed to select said rows by doing:
male = df['Gender'] == 'Male'
female = df['Gender'] == 'Female'
nan = df['Gender'][~(male | female)]
I took it a bit further and used .iloc
in conjunction with type()
in order to confirm that these elements were in fact NaN
. So why can’t I do the above using df['Gender'] == np.nan
?
That’s when tried this:
np.nan == np.nan
Output: False
This just broke my mathematical brain. Then I remembered an article I read the other day talking about the difference between ==
and is
operators. So I tried:
np.nan is np.nan
Output: True
Oh now it’s on! Now I need to know what’s going on here…I clearly wasn’t going to figure it out on my own so I found this article and I just had to share it:
https://towardsdatascience.com/navigating-the-■■■■-of-nans-in-python-71b12558895b#:~:text=NaN%20stands%20for%20Not%20A,any%20other%20type%20than%20float.
I hope you enjoy reading it as much as I did! Fascinating things those NaN
.