Why ORDER BY does not need subqueries but WHERE clause does

Hi,

SELECT Major_category, Major
  FROM recent_grads
 WHERE Major_category IN (SELECT Major_category 
                            FROM recent_grads 
                           GROUP BY Major_category
                           ORDER BY SUM(TOTAL) DESC
                           LIMIT 3);

I am struggling with the reason why ORDER BY clause does not require a subquery to be functional.
Does it work with a virtual column just like HAVING clause?
If so, is that mean that I can order the outcome by a range which is not even exist in the current table?

Hi @Ryan_L

Welcome to the community! Firstly, WHERE does not need to have a subquery to be functional (i.e. SELECT * FROM <table_name> WHERE <condition> as described here). HAVING is used to perform aggregate functions that WHERE could not perform as described in the documentation. In addition, INNER JOIN and subqueries are generally interchangeable. Note that HAVING can also be used together with WHERE as a “second layer” filter as shown in the example below.

/* The following SQL statement lists if the employees "Davolio" or "Fuller" have 
registered more than 25 orders: 
*/
SELECT Employees.LastName, COUNT(Orders.OrderID) AS NumberOfOrders
FROM Orders
INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
WHERE LastName = 'Davolio' OR LastName = 'Fuller'
GROUP BY LastName
HAVING COUNT(Orders.OrderID) > 25;

Answering your last question:

This should not work.

Your question is answered in this stackoverflow post.

The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself. You don’t need order by in your sub query.

In the future, please also remember to provide a question link as per these guidelines to allow us as community learning assistants to better assist you.

Hope this helps.

2 Likes

The reason why the ORDER BY can take an aggregation directly in my code is because I grouped the table first.
If I did not use the GROUP BY first, the aggregation will come up with a single value, which have no meaning to be ordered by.
Is that correct?

However, I still cannot completely understand why it is invalid to use aggregation in the WHERE clause.
Like the code below, we don’t need to GROUP BY first to calculate AVG(ShareWomen).
So why can’t I just apply the AVG(ShareWomen) with WHERE clause?
Is that merely because it is how the SQL syntax is designed?

SELECT *
  FROM recent_grads
 WHERE ShareWomen > AVG(ShareWomen);

In the future, please also remember to provide a question link as per these guidelines to allow us as community learning assistants to better assist you.

By question link, do you mean that I can post a link to the screen I asked about?
Like this:
https://app.dataquest.io/m/255/subqueries/5/returning-multiple-results-in-subqueries

1 Like

@Ryan_L:

This is correct is this case. However, you can also use ORDER BY without GROUP BY in some cases.

Yes. Aggregation should be used in the columns in the SELECT clause or the GROUP BY clause as described in this article. Here is an example:

SELECT COUNT(id) AS "Number of People"
FROM user_table

Yes. Exactly what I was referring to. Please do this for all future queries you have.

Thanks!

2 Likes

@masterryan.prof
Thank you for your kindness and detailed explanation.
I will always post a link in my future queries.

@Ryan_L: No worries, anytime! Have fun with SQL!

1 Like