SQL:DISTINCT - Behind the screen

Screen Link:
https://app.dataquest.io/m/253/summary-statistics/8/counting-unique-values

My Code:

SELECT DISTINCT Major, Major_category
  FROM recent_grads
 LIMIT 5;

What I expected to happen:
only the unique values from ‘Major_caegory’ column to be displayed

What actually happened:

Instead of getting, only the unique values from 'Major_caegory' column, we are getting repeated values.
Major	Major_category
PETROLEUM ENGINEERING	Engineering
MINING AND MINERAL ENGINEERING	Engineering
METALLURGICAL ENGINEERING	Engineering
NAVAL ARCHITECTURE AND MARINE ENGINEERING	Engineering
CHEMICAL ENGINEERING	Engineering

want to know, what happens behind the scenes when we are using DISTINCT on multiple columns.


so shouldn’t the major_Category column contain only unique values? we see Engineering repeated even after using DISTINCT.

However if we just use the code SELECT DISTINCT Major_category the results are fine. only when we are using DISTINCT with multiple columns like in the above example we get results with repeated values.
so wonder how DISTINCT works when used with multiple columns.

thank you.

No. When used with multiple columns, DISTINCT looks at both the columns instead of looking at them one-by-one.

So, if you had something like -

Col1 Col2
A B
A B
A C
D B

Using DISTINCT on both columns would return

Col1 Col2
A B
A C
D B

Because there were two (A, B)s, it only selects one of those because it looks at both the columns at the same time.

2 Likes

Make sense now. thank you for the explanation. appreciate it.

1 Like