CIA World Factbook set operator query question

I wondered how I could create a query to return rows where the base population is smaller than other countries , but the population_growth rate is at a percent fast enough that more people are added than the country with the larger population. Ideally I could have a condition where I return all those smaller countries which overtake the larger within 10 years.

I have just done complex queries in intermediate sql and thought I might be able to create two views, but do not know what of in this situation since I think I need to compare the population column to itself and also the growth_rate at the same time . I thought I might need UNION and the set operator EXCEPT but that is as far as I can get. Please advise?

Hi @jamesberentsen,

Please specify what the end result is what you like. Would it be a table with two countries where one is able to overtake the other? Because otherwise it is a bit vague the term larger population.

What are you comparing with what? Do you mean China as the biggest country and you would like the amount of years it takes to overtake them?

One general advice on SQL queries is that you really want to have the end result clearly in your mind before writing a single line. It will save you 646 tries :wink:

By the way, well done trying this hard, I always just give up.

Hope this helps :slight_smile:

1 Like

Hi DavidMiedema,

Would it be a table with two countries where one is able to overtake the other?
Correct.

The end result would be a column ‘year1 predicted increase’ ,‘year3 cumulative predicted increase’ etc added to all the columns in the screenshot.
For example India’s population after year1 and also China’s.
Let’s assume India’s yearly growth is 5% and China only 0.45% then we could see within under 5 years that India has overtaken China(I have not done calculation). So then there would be a final column population_after_5_years.

I amend orignal 10 as that would be too many columns.

Well, if you would like to create columns for every country for the next ten years that should not be too difficult. Just sum the population and the predicted increase every year. Don’t forget to calculate the numbers with the new population though.

Hi DavidMiedema,

That was not quite it though because I wanted to return the set of countries whose population is larger than another country (so comparing within the population column) and the set where the population growth is larger but population smaller. Then performing some kind of SET operation, EXCEPT etc.

Otherwise just the sum of predicted population increase for all countries and not those that would be overtaken within 5 years, would be all countries in the table.

SELECT C.Name
FROM facts C
WHERE C.Population >= 100000
EXCEPT
SELECT C.Population
FROM facts C, facts L
WHERE C.Population <= L.Population
AND C.population_growth > L.population_growth;

???

@jamesberentsen

I am sorry, this is too hard to follow for me. You’re talking about countries, whereas you would like to compare two countries before. This is really confusing.

Hi,

Sorry, I was not that clear.
the two countries were an example.
I meant a comparison of all countries who have populations smaller than another so that is one set.
the other set is that of population_growth.
Put all countries in set A1 who have smaller growth than those in A2

Blockquote
I meant a comparison of all countries who have populations smaller than another so that is one set.
the other set is that of populatin_growth.

Specify another you can’t just use that vague term!

One last tip, try not to edit so many times after you post, it’s really difficult to react to you when you change your posts all the time :sweat_smile:

I would recommend, leave this question and go on with the lessons.

EXCEPT

If you get a very, very crystal clear image of what you want.

Hi David,

Since I am talking about sets, I believe it is precise enough , so we can agree to disagree.

Can you please provide a sample input and a sample output (even if done “by hand”). I, too, am having trouble understanding what you’re trying to get.

Hi Bruno,

Sure.
This is my attempt at sample output.
I am trying a self-join , but I am getting an error.

I am thinking of creating two table one with populations in year one the other in year 5 then performing a join so I can see population in year1 and year5 next to each other along with the countries that meet the criteria in above screenshot.

You’re getting that error because you’re running SQL code in Python, as if it were Python code. Can you please also share your code as text so that I can run it on my end?

1 Like

Hi Bruno,

Yes see here please:

%%sql

SELECT 
t2.name,
t1.population/1000000,
t2.population_growth, 
FROM facts.name as t1
JOIN facts.name as t2
ON t1.name = t2.name AND t1.population_growth > t2.population_growth
AND t1.population < t2.population;

I am trying just to get the population in year 5 column by adapting the compound interest formula into sql from this page , but it is not working—

%%sql
SELECT
name,
population/1000000 ‘pop (M)’, population_growth ‘population_growth%’,
CAST((population * (population_growth/ 100 + 1) - population)/1000000 AS INT) ‘yr 1 predicted increase (M)’
,(population * POWER(1 + (population_growth / 100), 5)) - population ‘yr 5 predicted increase (M)’,

FROM facts
ORDER BY 4 DESC
LIMIT 10;

https://www.sqlservercentral.com/articles/creating-compound-interest-calculations-in-sql

That code snippet yields an error for me. I fixed it (I think) to the following:

SELECT t2.name, t1.population/1000000, t2.population_growth
  FROM facts as t1
  JOIN facts as t2
       ON t1.name = t2.name AND t1.population_growth > t2.population_growth
       AND t1.population < t2.population;

But this returns zero rows. Can you manually create input and output tables?

Hi Bruno,

Sorry not sure what is input and output tables .
You mean to execute the code?

Would you know why the power function not working?

, (population * POWER(1 + (population_growth / 100), 5)) - population ‘yr 5 predicted increase (M)’,

I mean something like the following.

Input:

a_column another_column
1 1
3 2

Output:

a_column another_column sum
1 1 2
3 2 5

input would be above table filtered to the rows according to the criteria here

```
SELECT t2.name, t1.population/1000000, t2.population_growth
  FROM facts as t1
  JOIN facts as t2
       ON t1.name = t2.name AND
 t1.population_growth > t2.population_growth
 AND       t1.year1.population < t2.population 
AND       t1.year5.population >t2.population;


```

although I am not quite sure how to relate year1 column to year 5 such that I can show the rows which have been overtaken in population by those
rows which had a larger population and smaller population growth in year1
are filtered out if they have been overtaken in population terms by those which had a smaller population but higher growth.

Sorry, I can’t help you. I don’t understand what you’re saying. The code you’ve provided for the input doesn’t run.

what specifically do you not understand?

Do you not understand this?

Would you know why the power function not working?

, (population * POWER(1 + (population_growth / 100), 5)) - population ‘yr 5 predicted increase (M)’,

I asked for input and output. You gave me code as input that doesn’t run.

I realize (now) that’s part of the problem, but that still means input is missing and that doesn’t help matters. A visual representation of the table (like I showed in the example) would have probably worked better.

Anyway, what you want to do isn’t possible in pure SQLite as SQLite doesn’t implement a power function.

That a pure SQLite solution doesn’t exist, isn’t helpful to getting the results you see. So how is it that you’d get what you want? The most obvious solution is to run SQL from Python, maybe get the data as a dataframe and then use pandas.

An alternative is to leverage the power of mathematics. Exponentiation can be reduced (with an error margin) to a polynomial. This may work. I won’t write the details of this unless this is an avenue you’re interested in pursuing.

1 Like