BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED

# SQL m/179, GROUP BY explaination:

My Code:

``````SELECT
f.name country,
c.urban_pop,
f.population total_pop,
(c.urban_pop / CAST(f.population AS FLOAT)) urban_pct
FROM facts as f
INNER JOIN (
SELECT
facts_id,
SUM(population) urban_pop
FROM cities
) c ON c.facts_id = f.id
WHERE urban_pct > .5
ORDER BY 4 ASC;
``````

What I expected to happen:
I expected the code to output 18 countries, same as the code with “GROUB BY 1” clause in the INNER subquery

What actually happened:
The query only Zimbabwe as country. Which does not appear in the correct code.

The correct code had GROUP BY 1, I am not sure why that is correct? (see below)

``````SELECT
f.name country,
c.urban_pop,
f.population total_pop,
(c.urban_pop / CAST(f.population AS FLOAT)) urban_pct
FROM facts as f
INNER JOIN (
SELECT
facts_id,
SUM(population) urban_pop
FROM cities
GROUP BY 1
) c ON c.facts_id = f.id
WHERE urban_pct > .5
ORDER BY 4 ASC;
``````
1 Like

Hi @Chirc,

Welcome to the community. To understand why this is happening, let’s try to observe the result of running this query:

``````SELECT
facts_id,
SUM(population) urban_pop
FROM cities
``````
facts_id urban_pop
195 991685100

Why are we getting a single row? It’s because we are using an aggregate function `SUM()` to find the sum of the population column. Aggregate functions are applied to all values in a column. Which is fantastic. Cause that is why finding the total rows in a table is so easy as using this simple query:

`SELECT COUNT(*) FROM cities`

However, often, we have to apply the aggregate function to a group of rows instead of all rows. That is where `GROUP BY` comes into action. `GROUP BY` allows us to split the rows into groups based on the values in a specific column (or multiple columns) and then apply our aggregate function to all those groups.

For example, in this case, we should group by the `facts_id` column so that we can find the population of each country.

``````SELECT
facts_id,
SUM(population) urban_pop
FROM cities
GROUP BY 1 -- 1 specifies that
-- we want to group it by the
-- first column, that is facts_id
``````

If we use `INNER JOIN` without grouping it, the result will only have 1 row as the subquery is only producing a single row.

Hope this helps

Best,
Sahil

1 Like