Seaborn - Heatmap - Working With Missing And Duplicate Data Mission

Hello :slight_smile:

I tried to replicate this piece of code:

sorted = combined.set_index('REGION').sort_values(['REGION', 'HAPPINESS SCORE'])
sns.heatmap(sorted.isnull(), cbar=False)

But it returns: KeyError: ‘REGION’

Another thins, when I try to generate a heatmap, the outcome is nothing compared to the ones exemplified, I’ve got plenty of indexes while in your examples you only have few. Like for example when you choose YEAR as an index, you’ve got only a few sample, while I have all, I think:
Yours:

import seaborn as sns
combined_updated = combined.set_index('YEAR')
sns.heatmap(combined_updated.isnull(), cbar=False)

imagem
Mine:

import seaborn as sns

heat_map_combined = combined.set_index('YEAR')
sns.heatmap(heat_map_combined.isnull(), cbar=False)
plt.show()

Why does this happens?

Many thanks for the help :slight_smile:

1 Like

Anyone? :slight_smile:

Hi @a.xitas. I’m not as well-versed in Seaborn, but I’ve been playing around with this to see if I could figure it out. I couldn’t figure it all out, but I’ll share what I did manage to do and hopefully it helps you!

First, with the KeyError: 'REGION' issue. It looks like what’s happening is that when we set the index using the 'REGION' column, it also deletes that column from the new dataframe. Then when we try to use .sort_values(['REGION', 'HAPPINESS SCORE']), there’s no ‘REGION’ column and it throws up the key error. To fix this, you can add the parameter drop=False in set_index(). [documentation]

When you get that settled and then try to plot the heatmap of sorted, you’re going to get an ugly graph where the y-axis labels overlap. You can make it prettier by adding the yticklabels parameter to the heatmap. [documentation] Setting it equal to an integer tells it to use the labels, but only every n so that it’s readable. I found that 20 made match pretty closely with the example.

sns.heatmap(sorted.isnull(), cbar=False, yticklabels=20)

This is what I got. The colors are inverted but it looks okay!
image

You can try adding the yticklabels to your other example and see if it fixes the overlapping there as well. As far as it having less values, I’m not sure about that.
In any case, I hope this helps and that exploring the documentation gives you some more ideas.

Since I had to make some code changes compared to the examples to make these work, I’m also going to loop in @Sahil to see if he can be of any help or has some insight.

Edit: Including the link to the page since it took me a while to find it
https://app.dataquest.io/m/347/working-with-missing-and-duplicate-data/10/analyzing-missing-data

3 Likes

You are beautiful, April :smiley:

Thank you very much for your help :slight_smile:

Kind Regards