Need help to understand code

Screen Link: https://app.dataquest.io/m/347/working-with-missing-and-duplicate-data/10/analyzing-missing-data

I am not able to understand the code and the heatmap under Question: Will dropping missing values cause us to lose valuable information in other columns?

When I am trying to run the same code in the earlier screen, it’s throwing an error which I am not able to understand either.

Please help.
Thanks!
Aditya

Hi Aditya, welcome to the community!

Another student asked about this same screen in this post:

It might contain the answers to your question and gives an alternate version of the code that seemed to work for trying it out on your own.

Hey April!

The answer to that question indeed helped. I am able to run the code now. Many thanks!

I have one more related question, though. Why do certain Region names appear multiple times on the Y-axis of the heatmap?
For example - Central and Eastern Europe appears four times.

Each one of those represents a row of the dataset and is colored according to null/non-null values. When creating the sorted dataframe, we made the REGION column the index, so you’ll see each region multiple times. If you don’t use the yticklabels=20 parameter with the heatmap, all the labels are used which ends up blending it all together to make it unreadable. The yticklabels=20 makes it so it only prints one label for every 20 rows, but we’ll still see duplicates because there were several rows to start with. I hope that makes sense!

Hey April!

Upon understanding what an ‘int’ value means for the yticklabels parameter, I could grasp why the y-axis contained duplicates.
Thanks a lot for the clarification!