Guided Project__Analyzing CIA Factbook Data Using SQL

Hi everyone,

This is already the second time when I’m trying to share here my project on the CIA Factbook data. The first time I had a technical issue: the images were displayed properly in Jupyter but not displayed in nbviewer. Luckily, @WilfriedF suggested to me an efficient workaround, so now everything is successfully fixed.

Doing this project, I learned some new things for my general knowledge, like water-to-land ratio and what other types of “waters” exist. Also, I discovered a new country, Saint Pierre and Miquelon! :grinning: I wonder if in our Community there is somebody from that country :yum:

Interesting and unexpected (well, at least for me) statistics for Eastern Europe and Baltic countries, as well as for the Middle East and Japan.

One thing that I really miss in SQL is the possibility to create a function for some pieces of code. Sometimes the code looks very repetitive and long.

As usual, any your suggestions are very welcome. What can be optimized / modified / removed / added in my code and in the project itself?

Thanks a lot in advance!

Analyzing CIA Factbook Data Using SQL.ipynb (1.1 MB)

Click here to view the jupyter notebook file in a new tab


Hi Elena,

Congratulations for your project, I couldn’t stay without saying a word since you mentioned me in your post (I was so glad to help the helper!) but I have been busy this week.

I know you always make a good work so not a suprise for me here, I appreciate the efforts you made in finding an equilibrium between text, titles and code, and also nice maps! A careful reader will note that you have been interested by the subject and that you spent time to investigate and learn more for your “general knowledge” (quoting you), and this is exactly what I like to do too!

By the way, Saint-Pierre-et-Miquelon is a French territory! My great-uncle I never known who died during WW2 was a terre-neuvas and I always heard younger by his sister he fished for cod - a 500 years old fishery - in this water area. Oh wait, should I say coastal waters? :grinning:

And at that time cross the Atlantic was quite an odyssey! Now see how sadly ends the story and what happened here: Collapse of the Atlantic northwest cod fishery.

Regarding your notebook, quickly (well, I will try):

  • the Summary of Results after the introduction is very welcome and make pro, but maybe it’s redundant with the conclusion and too dense?
  • population statistics: may be personal preferences, but why not showing the population by country in terms of percentage of world population? Prevalence is more informative to my opinion and will help the storytelling to develop.
  • birth rate, death rate, growth rate: you made special efforts here in being informative and it’s very stimulating. I was impatient to share my thoughts with you about it. There are statistics that deserves to be explored in global warming context since population growth is a serious concern. In fact, population growth started to explode in 18th-19th century in Europe and studies show that this is due to progress between others in hygiene, urbanism, water treatment, so less epidemics, etc. The first-order consequence of better life conditions is that child mortality decreased gradually. And what happened when child mortality decreases? People make less children and birth rate starts also to decrease. This is observed everywhere. So natural questions were arousing here: do the countries with lower death rates have also the lower birth rates? How strong the correlation is? Etc. Of course, these are just ideas on the fly that may interest you for future projects.

And that’s all for today!


1 Like

Hi Wilfried,

Thanks a lot for your such a nice and helpful feedback! Now I have read the story of those lands, including Saint Pierre and Miquelon, it’s amazing what an ancient story they have! And it’s really sad how everything was finished because of overfishing. By the way, I also found out that those islands were named after Saint Peter, because he was the patron saint of fishermen.

About my summary of results, well, honestly, its length looked scaring also to me, the very moment I was writing it :grinning: I was trying to stuff it with as much results as possible, but now I think it was a bit excessive. Now, since you also noticed that, I’ll try to shrink it to 4-5 lines. Those people who are curious to read all my conclusions can always refer to the conclusion section at the end :blush:

Very good suggestion about the population by country in %, I will use it as well, both for this and future projects (of course for the TOP20 countries, where it really makes sense). Even better, in this case I will leave both columns: in absolute numbers but converted to millions of people and in %.

About your idea (whether the countries with lower death rates have also lower birth rates), I also became curious and have just checked this hypothesis. In our case, the lowest death rates are observed in the Middle East, and I see now that the birth rates in the majority of those countries are also slightly below average (Well, we have to keep in mind, though, that the average is strongly influenced by African countries). Two exceptions are Oman (a low death rate while the birth rate is quite above average) and Yemen (but it’s not even in the list of low death rates). Interesting, anyway! Before your comment, I thought just the opposite: high standard of living, hence low death rates and high birth rates.

Thank you again for a valuable feedback!


Hi Elena,

Saint Peter of course! Peter in reality was originally a fisher and there is sequences in the Holy Bible about miraculous catch of fish.
What happens then in the christian iconography is that the fishes are a metaphor of the sinners that the Apostles or the Church are saving from them. In French, you can play a lot with this analogy since you say to fish “pêcher” and to sin “pécher”, two homophone words!
And it fits pretty well with Peter since the fisher-sinner will deny Jesus three times before his definitive repentance!

What about a simple scatter plot? You surely already know scipy.stats.linregress: this computes a least-squares regression for two sets of measurement, so you can see quickly if there is a kind of correlation. More work for you! :slight_smile:


1 Like

Hi Wilfried,

Incredible, only now I noticed that also in Italian “peccare” and “pescare” sounds very similar! :frowning: I couldn’t even imagine that there is such an unexpected explanation behind, these words in their usual meaning doesn’t seem to have much to do with each other.

About creating a plot, well, that’s a good point. And not only a scatter, but it would be great also to create bar plots to visually compare all those TOP20 and BOTTOM20 rates between each other. The problem is that I cannot figure it out how to convert SQL tables into dataframes / series / ndarrays, and all those libraries seem to be adapted only to those types of input. This is actually the reason why I put all those pictures from the Internet, since I was not able to create my own beautiful graphs :joy: While trying to find it out, I had googled something about the plotly library that is somehow used to render SQL in series, but it really looks scaring for me. Even now, when I am doing my second project on SQL, I would like to create some plots, but I still feel confused and scared :see_no_evil:

1 Like

Hi Elena,

Well, you can output and save the csv from the database and then load the csv with pd.read_csv.
There is another method, more direct, with psycopg2 library, pd.read_sql but I realized then that your guided project is about a SQLite database!
After a quick research, I found that you can use the package sqlite3 than seems very closed to psycopg2. Then call pd.read_sql_query.

Indeed I would not be surprised if it works for all romance languages!

1 Like

Thanks a lot Wilfried, with sqlite3 it works perfectly in my case! :star_struck: I have already tried it in my current project (here I found a good DQ post about how exactly to do it). And it doesn’t look so scaring as the other approaches which I had googled before, so now I will be able to create beautiful graphs in both my SQL projects! Thanks again, it was really a good suggestion! :+1:

1 Like

Hi Elena!

Very well structured and nice looking project. I see that you put a lot of effort to make this project. I have a question if you don’t mind to answer.

How did you get different line graphs in your project? I went to the link “Knoema Data Source” which you mentioned in your project, but unfortunately I couldn’t find any similar graphs to yours there. Especially the line graph about MONACO BIRTH RATE.

Hi Ulugbek,

Thanks a lot, I’m happy to know that my work was appreciated!

As for the graphs from Knoema, this is a more precise link: Knoema atlas. Practically, it’s the same as opening Data -> Countries from the home page of Knoema. Then you can select a country and whatever metrics you would like to check for it. For example, this is the link to Monako birth rate.

got this error…plz help where I did wrong
factbookAccess SQL thru Jupiter.ipynb (9.1 KB)

Click here to view the jupyter notebook file in a new tab

Hi @parulgupta31,

In the code cell [3] you should remove the last line and leave only these 3 lines:

%load_ext sql
%sql sqlite:///factbook.db

Please don’t add anything else in that cell, even code comments - nothing. Running the code above, you’ll successfully connect to the Factbook database and will be able to start running queries.