CIA FACTDATA db and null values


I was looking at some people’s projects and noticed that not many people bothered to filter out countries which had null values in the population variable to calculate overall average, for example. Therefore, I’m confused. What is SQLite’s behavior when it encounters null values while taking an average of some collumn such as population or water_area? What is more valid to filter it out and exclude those lines from my calculation or to ignore them?

In this example differences are small because population numbers are quite high comparing to country quantity, but if the calculation would be of average life expectancy, for instance I think it would make a big difference.

Besides that I found a step by step guide on how to use the CIA API:

In this post they used another method to connect to the database, which I ended up using because the suggested method in datacamp just didn’t seem to work. What is the difference though from:

import sqlite3
import pandas as pd
conn= sqlite3.connect(‘factbook.db’)
q= “SELECT * FROM sqlite_master WHERE type=‘table’;”


%load_ext sql
%sql sqlite:///factbook.db

And why might the second not work? (I would type that and simply not get an output)

Also another interesting thing from the method used there is that they seem to store their queries in variables, so I guess they could use the variable stored as a substitute for subquerying (not sure if this is a word). Like this:

q= “SELECT * FROM sqlite_master WHERE type=‘table’;”

However I was not able to repeat this with my own queries for some reason. Can someone explain what is being done with pandas here and if it can also be done using the %%sql magic?