CYBER WEEK - EXTRA SAVINGS EVENT

# GROUP BY question

Hello,
I had two questions on the “Subqueries” Mission on Page 5
URL:Learn data science with Python and R projects

Question 1: This is the code that is on Page 5:

``````SELECT Major_category, SUM(TOTAL)
GROUP BY Major_category
ORDER BY SUM(TOTAL) DESC;
``````

I tried this just to see what would happen (I took the GROUP BY out):

``````SELECT Major_category, SUM(TOTAL)
ORDER BY SUM(TOTAL) DESC
``````

My code only returns “Education” for the Major_category. I was wondering if someone could tell me why? Why “Education” and not something else?

Question 2:

``````SELECT Major_category, Major
WHERE Major_category IN (SELECT Major_category
GROUP BY Major_category
ORDER BY SUM(TOTAL) DESC
LIMIT 3
);
``````

I tried:

``````SELECT Major_category, Major
WHERE Major_category IN (SELECT Major_category
ORDER BY SUM(TOTAL) DESC
LIMIT 3
);
``````

And this produced an error. Why do we need the GROUP BY clause here?

First of all, `SUM()` is an aggregate function and it doesn’t make sense not to use it as such.

I guess this is some random data you have got.

And the error clearly states:

`misuse of aggregate: SUM()`

Since `SUM()` is an aggregate function, we should perform an aggregation first.

@dash.debasmita,
The error that gets produced, does that mean you always have to use the “GROUP BY” clause first? I’m confused about the error because I thought `ORDER BY SUM(Total)` is an aggregation but it isn’t. It seems like you have to have something before it.

We can aggregate without using a `GROUP BY`, but in that case, we’ll be aggregating the full table. Refer here

You’re getting this error while running

``````SELECT Major_category, Major
WHERE Major_category IN (SELECT Major_category
ORDER BY SUM(TOTAL) DESC
LIMIT 3
);
``````

The problem here is with the subquery,

``````SELECT Major_category
ORDER BY SUM(TOTAL) DESC
LIMIT 3
``````

You don’t have `SUM(TOTAL)` in your `SELECT` clause, in that case, how can you expect the result to get sorted by the `SUM(TOTAL)` column?

For the correct subquery:

``````WHERE Major_category IN (SELECT Major_category
GROUP BY Major_category
ORDER BY SUM(TOTAL) DESC
LIMIT 3
);
``````

Why is it when you have the GROUP BY clause you don’t need to have the `SUM(Total)` in the SELECT clause here?

Also, how do you aggregate a full table? I thought you aggregate a column and it returns one value. I read the article and the implicit GROUP BY.

I think you got confused.
What do you get when you run

``````SELECT SUM(Total)
``````SELECT Major_category FROM recent_grads