Beginner - SQL COUNT Function, Different Answers

Screen Link: < Computing SQL Summary Statistics | Dataquest>
Question:
Write a query that returns the number of majors that include mostly men.

  • For answer-checking purposes, make sure you type the aggregate function using all uppercase letters, and make sure that you spell the column name exactly like this: Major.

My Code: SELECT COUNT(Majors)
FROM recent_grads
WHERE Men > 0.5;

MY ANSWER IS LISTED ABOVE

What I expected to happen: I would expect
WHERE Men > 0.5 ; and WHERE ShareWomen <0.5 to give the same answer.
Isn’t ShareWomen <0.5 the same as Men >0.5?
What am I missing?

What actually happened:
The actual correct answer, according to the course is;
SELECT COUNT(Majors)
FROM recent_grads
WHERE ShareWomen < 0.5;
Results:
COUNT(Major) = 76

My answer using the WHERE MEN > 0.5;
results in COUNT(Major) = 173

Replace this line with the output/error

Hi lmwood270

Welcome to the community!

Here ShareWomen is the proportion of women graduates. So, Men > 0.5 is not the same as ShareWomen < 0.5.
If you check further, output of

SELECT COUNT(Majors)
FROM recent_grads
WHERE Men > 0.5;

is the same as

SELECT COUNT(*)
FROM recent_grads;

That is, WHERE Men > 0.5 returns all the rows in the table.

Hope its clear now.
Thanks.

COUNT

COUNT counts the number of non-NULL items. Any NULL items are ignored.

COUNT(*) - Count rows in the table

COUNT(Major) - Count the number of (duplicate and unique) Majors in the table

In the DQ exercise example, COUNT(Major) may have the same result as COUNT(*), but what it represent differs.

COUNT(*) != COUNT(Major)

Suppose Major column has duplicate values.

COUNT(DISTINCT Major) != COUNT(Major)

Suppose Major column has null values.

COUNT(*) != COUNT(Major)

COUNT with other constraints

COUNT is an aggregate function. You may replace COUNT with other aggregate function to explain yourself what does other aggregate function does with other constraint.

Rows are removed when the constraint in the SQL query clause is not met.

COUNT(...) may differ in number when you have constraints in the following SQL query clauses (and also by SQL query clause order from top to bottom):

  • [LEFT | RIGHT | FULL | CROSS] JOIN
  • WHERE
  • GROUP BY
  • HAVING

COUNT(...) counts the remaining number of rows.

Example 1

SELECT
    COUNT(Major)
FROM
    recent_grads
WHERE
   ShareWomen < 0.3

In example 1, rows with ShareWomen >= 0.3 are removed. COUNT(Major) returns the number of non-NULL Majors where ShareWomen < 0.3.

Example 2

SELECT
    Major_category,
    COUNT(Major)
FROM
    recent_grads
GROUP BY
   Major_category

In example 2, COUNT(Major) counts the number of majors in each Major_category.

Example 3

SELECT
   Major_category,
   COUNT(Major)
FROM
   recent_grads
GROUP BY
   Major_category
HAVING
   Major_category = "Arts" 
   OR Major_category = "Engineering

In example 3, COUNT(Major) counts the number of majors in Arts or Engineering.

Summary

To count number of rows use COUNT(*).

To count number of Major use COUNT(Major).

To count number of items in the column use COUNT(column).

What COUNT(...) represents is determined by what rows are removed by these constraints in the SQL query.

Hello @Imwood270 , I think just their description will give you good understanding :

  • Men — the number of graduates who are men.
  • ShareWomen — women as a proportion of the total number of graduates (a number ranging from 0 to 1).

why this answer :
SELECT Major
FROM recent_grads
WHERE ShareWomen > 0.5;

when this is more simple and gives same answer :
SELECT Major
FROM recent_grads
WHERE men > women