Question about SQL!

Hi, I just started doing the basic SQL course…I’m not sure what is the logic behind the code. So if I use the GROUP BY command, this is the correct answer:

But if I omit GROUP BY, then I get this:

What am I actually doing if I type the code in this manner? Like why does it only give me ‘Education’ rather than a list of the other values in Major_category? I don’t understand the logic here, would be glad if someone could explain why this is the result of omitting GROUP BY. Thank you!

Link to mission: https://app.dataquest.io/m/254/group-summary-statistics/4/calculating-group-level-summary-statistics

Please share the link to the Mission as well.

edited the post with a link!

1 Like

Take a simple table

A | 12
A | 54
B | 75
C | 50
A | 12
B | 98
C | 11

When you do a GROUP BY based on a column, it creates a group based on unique values in that column. So from above you would get the following three groups -

For A

A | 12
A | 54
A | 12

For B

B | 75
B | 98

For C

C | 50
C | 11

When you use any aggregate when you are grouping values, the aggregate is applied to each group separately.

So, if you use SUM, you will get the sum of values for each -

A | 78 (12 + 54 + 12)
B | 173 (75 + 98)
C | 61 (50 + 11)

That’s what happens when you group and aggregate on that group.

But when you don’t group and use SUM, you will just sum over the entire column.

C | 312

Now, since you sum over the entire second column, that sum does not correspond to any of the rows in the first column. It’s just the sum of all values, it’s not associated to either A, or B, or C.

But since you are asking it to also select the other column, it prints out the last value from the first column. In my example above, it’s C. In your case was Education.

It has no significance because you are calculating the sum which can’t correspond to a single Major_category. Since you included Major_category in your SELECT statement, it displayed the value in the last row.

1 Like

Thank you for your reply!!