BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

SQL m/179, GROUP BY explaination:

Screen Link:

My Code:

SELECT
    f.name country,
    c.urban_pop,
    f.population total_pop,
    (c.urban_pop / CAST(f.population AS FLOAT)) urban_pct
FROM facts as f
INNER JOIN (
            SELECT
                facts_id,
                SUM(population) urban_pop
            FROM cities
           ) c ON c.facts_id = f.id
WHERE urban_pct > .5
ORDER BY 4 ASC;

What I expected to happen:
I expected the code to output 18 countries, same as the code with “GROUB BY 1” clause in the INNER subquery

What actually happened:
The query only Zimbabwe as country. Which does not appear in the correct code.

The correct code had GROUP BY 1, I am not sure why that is correct? (see below)

SELECT
    f.name country,
    c.urban_pop,
    f.population total_pop,
    (c.urban_pop / CAST(f.population AS FLOAT)) urban_pct
FROM facts as f
INNER JOIN (
            SELECT
                facts_id,
                SUM(population) urban_pop
            FROM cities
            GROUP BY 1
           ) c ON c.facts_id = f.id
WHERE urban_pct > .5
ORDER BY 4 ASC;
1 Like

Hi @Chirc,

Welcome to the community. To understand why this is happening, let’s try to observe the result of running this query:

SELECT
    facts_id,
    SUM(population) urban_pop
FROM cities
facts_id urban_pop
195 991685100

Why are we getting a single row? It’s because we are using an aggregate function SUM() to find the sum of the population column. Aggregate functions are applied to all values in a column. Which is fantastic. Cause that is why finding the total rows in a table is so easy as using this simple query:

SELECT COUNT(*) FROM cities

However, often, we have to apply the aggregate function to a group of rows instead of all rows. That is where GROUP BY comes into action. GROUP BY allows us to split the rows into groups based on the values in a specific column (or multiple columns) and then apply our aggregate function to all those groups.

For example, in this case, we should group by the facts_id column so that we can find the population of each country.

SELECT
    facts_id,
    SUM(population) urban_pop
FROM cities
GROUP BY 1 -- 1 specifies that
-- we want to group it by the
-- first column, that is facts_id

If we use INNER JOIN without grouping it, the result will only have 1 row as the subquery is only producing a single row.

Hope this helps :slightly_smiling_face:

Best,
Sahil

1 Like