Ways to avoid NaN in Jupyter Notebook when Mapping

Hi,

I was working on this mission when I came across this text:

Because none of the corrected values in our series existed as keys in our dictionary, all values became NaN ! It’s a very common occurence, especially when working in Jupyter notebook, where you can easily re-run cells.

I was wondering if there are methods to avoid this from happening since I use the re-run all cells function a lot when I’m using Jupyter.

Sorry if I’m posting in the wrong forum - I’m not sure if this counts as a DQ topic or a non-DQ topic since although it’s related to DQ, I’m not interested in solving a problem in DQ but rather am interested in gaining knowledge that I can use to help me in future projects.

Thanks,
Jeremy

1 Like

I didn’t look at the mission, but my intuition is you are talking about series.map.
In normal circumstances, the proper workflow is to prototype in ipynb and then transfer to .py, and run the .py file as a whole top to bottom, with only 1 entry point at the top so no chance of rerunning cells in the middle of the program.
If you don’t want to use .py, then make sure you organize the workbook in a way that you’ll never run map twice in the middle of a pipeline. If you do run it twice, something upstream in the data processing chain must be re-run too so you get the unclean data to feed into map. You can do this by wrapping a sequence of steps into a high level function, so you can only run the function and not touch things inside. This function is analogous to a .py file. Don’t stuff everything into 1 function only though but breakdown this function into separate smaller ones.

Hi Jeremy. It has been a while since you asked this, but I wondered a similar question as you did, so I found this Stackoverflow url that shows an option. I implemented it like this:

mapping_dict = {
    'Mac OS': 'macOS',
    'macOS': 'macOS'
}
print(laptops['os'].value_counts()) #To see the series 'os' before change it
#If I use the map with a dictionary of just the values that I want to change 
#the map function will change the other values to NaN. To avoid that and force
#to use the same that it has before, I can use fillna(laptops['os']), so it fills the NaN with the same
#values that were before.
laptops['os'] = laptops['os'].map(mapping_dict).fillna(laptops['os']) 
print(laptops['os'].value_counts()) #To see the series 'os' after the change

By using the fillna() method, you don’t need to specify all the keys: values in the dictionary if you don’t want to change every value.

I hope this helps.

1 Like