Challenge: Complex Query with Joins and Subqueries - Got Stuck and Have a Few Questions

Screen Link:
https://app.dataquest.io/m/463/joining-data-in-sql/8/challenge-complex-query-with-joins-and-subqueries

My Code:

SELECT facts.name AS country, SUM(cities.population) AS urban_pop, facts.population AS total_pop,(cities.population/facts.population) AS urban_pct
FROM facts
WHERE facts.name > (.5*facts.population) 
--INNER JOIN (
    --SELECT * FROM facts
    --WHERE .5*facts.population)
    --cities ON cities.facts_id = facts.id; 

    INNER JOIN (
    SELECT * 
    FROM cities 
    WHERE SUM(cities.population) AS urban_pop
    )
    cities ON cities.facts_id = facts.id; 
    

What I expected to happen:
I wasn’t able to solve this without looking at the hint and then the answer. I don’t understand the thought process behind the construction of the subquery. In particular:

  1. In the subquery SELECT statement, why is facts_id selected along with SUM(population) urban_pop ? Doesn’t the specific population column need to be specified, or is it because the table for the subquery is cities, we are clearly referring to cities.population? Also, why facts_id?
  2. I also don’t understand the use of “GROUP BY 1” in the subquery. What role is this playing?
  3. I understand my query is incorrect, but I am not sure why this doesn’t run and outputs the error below.
    What actually happened:
(sqlite3.OperationalError) no such column: cities.population
[SQL: SELECT facts.name AS country, SUM(cities.population) AS urban_pop, facts.population AS total_pop,(cities.population/facts.population) AS urban_pct FROM facts WHERE facts.name > (.5*facts.population)  --INNER JOIN (     --SELECT * FROM facts     --WHERE .5*facts.population)     --cities ON cities.facts_id = facts.id;       INNER JOIN (     SELECT *      FROM cities      WHERE SUM(cities.population) AS urban_pop     )     cities ON cities.facts_id = facts.id;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

Run Code

Thank you!

So that you can group by the id to get the sum of the population of the cities in a specific country. facts_id corresponds to the country the city is in.

To get the total city population of a country by grouping them based on id.

Yes, unfortunately, it’s not made clear enough, but if you use -- the DQ platform ignores everything after the first commented statement.

So, effectively, only the following part of your code runs -

SELECT facts.name AS country, SUM(cities.population) AS urban_pop, facts.population AS total_pop,(cities.population/facts.population) AS urban_pct
FROM facts
WHERE facts.name > (.5*facts.population) 

And that throws the error because the INNER JOIN statement after the commented out code is ignored.

To comment out SQL code, place it inside of /* and */ -

/*INNER JOIN (
SELECT * FROM facts
WHERE .5*facts.population)
cities ON cities.facts_id = facts.id;*/

Running your code after that will produce a new error because of the following statement in your INNER JOIN -

Which, I hope, you can focus on fixing. If the issue isn’t clear I would first recommend going through the content on the WHERE clause again. Otherwise, feel free to ask follow-up questions!