This was a straightforward project and I would like to share my solution. I checked some other solutions and learned how to use
ticks in markdown mode.
I was not quite able to figure out what the
migration_rate field really stands for. The project instructions seem to omit a description for this field, and a search on the CIA website does not yield useful results. After ascertaining that there are no negative values in the
migration_rate field, I can only conclude that it is not the same as the “net migration rate” that CIA facts webpages refer to, because “net migration rate” can be a negative value.
SQL for Data Analysis.ipynb (38.0 KB)
GitHub link. Please note, jupyter file markdown navigation links do not work in github mode, it is a known bug.
Click here to view the jupyter notebook file in a new tab
Thank you for sharing another good project! I see that now the navigation links work well, that’s a good news How have you resolved it, by the way?
Your project is well-structured, the code follows SQL guidelines and that’s why looks very neat. This your insight:
It’s worth noting that India will add more than 15 million people as compared to China’s 6 million.
looks curious and quite unexpected, at least for me.
Now, I have some ideas for your consideration:
About the migration rate: yes, on the CIA website it should be “net migration rate”, i.e. immigrants - emigrants per 1000 persons. The second (and the last) kind of migration rate is “gross migration rate”, i.e. immigrants + emigrants per 1000 persons, which is, evidently, always positive or 0. Anyway, on the CIA website the migration rate is of the first type, but the values in the factbook version for this guided project are somewhat strange, I totally agree with you. There are no negative values that are definitely supposed to be: for example, for Syria or for other countries where in 2015 (the date of this factbook version) there was a war and an obvious flow of refugees and, of course, no immigrants. So yes, something strange happened with that column. In fact, when doing this project, I was also desperate to figure it out and just gave up to include the migration rate data in my analysis.
The code cell : it would be good to use the aliases for the columns here, to improve their readability.
The code cells  and  can be combined in one.
The code cell : it’s better to exclude also Antarctica.
Even though it’s true that the cell outputs show the results of each query, I would add more markdown explanations all around the project on what these results really mean. For example, for the countries with higher death rate than birth rate it can be interesting to investigate what exactly happened there that led to such circumstances. I was very surprised to see among those countries (and even at the top places!) Bulgaria, Ukraine, and Baltic countries. Some inquiries on these countries and their current situation would be interesting: if there was a particularly strong economical or demographical crisis, if some government programs were applied to resolve the problem, if the situation is improved now, 5 years later, etc.
As for water-land ratio, apart from British Indian Ocean Territory, there are some other “ill” data in the dataset, and unfortunately they are exactly one of the “leaders” in the TOP-list. To better understand what is supposed to be a “country water” and what issues it can bring to your data analysis, you can refer to these links:
Once again about the water-land sections: probably you should put them after all the demographic sections, for not mixing them up.
It’s better to extend the conclusion and add the most interesting geographical / demographical insights that were obtained while doing this project. Also, I would add something about it in the introduction as well, as a goal of the project: we are going to explore demographical / geographical tendencies…, etc, something like that.
I hope my suggestions were helpful for this your project and for the future ones. Happy learning and interesting insights!
Thanks for your feedback again.
- The navigation links always seem to work in nbviewer, just not github at the moment.
- It is true that some geographical patterns seem to emerge and I could have called them out, for example high birthrates in African countries, high death rates in Eastern Europe, etc.
- To try and analyze political and economic situation around the world that could affect certain metrics is to truly go well beyond the scope of this project, however that would require additional outside data to be sure, maybe some index that quantifies the quality of life and healthcare would be a good start.
- I suspected there’s something “fishy” with the water area just by looking at it, thanks for including the links.
- I was answering questions in order, but yes it’s a good idea not to have water analysis in between population analysis.