SQL - JOINS Practice Questions. WHERE clause not used in solution

Screen Link:
Practice Problems For Joins And Other Clauses | Dataquest

My Code:

SELECT pu.parental_education, pu.lunch, 
COUNT(s.plays_sports) *100.0 / (SELECT COUNT(*) FROM survey)percentage_sports 
FROM performance_updated pu
INNER JOIN survey s
ON pu.student_id = s.student_id
WHERE s.plays_sports = 1
GROUP BY pu.parental_education, pu.lunch;

What I expected to happen:
Return a query that returns the parental_education and lunch columns from performance_updated table, as well as the percentage of students in each combination that participate in sports (plays_sports in survey ).

What actually happened:

The query result table has less rows than we expected. We expected a table with 12 rows but the provided table only has 11 rows.

Solution from DQ is as below, however it is calculating AVG , whereas the mission asks us to
calculate percentage for the students playing sport i.e. column plays_sports value should be equal to 1, for each combination of parental_education & lunch.

Please help if if i am misunderstanding it.

SELECT pu.parental_education,
pu.lunch,
AVG(s.plays_sports) * 100 AS percentage_sports
FROM performance_updated AS pu
JOIN survey AS s
ON pu.student_id = s.student_id
GROUP BY pu.parental_education, pu.lunch;

Since there are 100 rows in survey and the values are either 0 or 1 in the plays_sports column, taking the average and multiplying by 100 will effectively calculate the percentage. Simply taking the sum of this column would get you there as well. EDIT_1: This is true only if we aren’t grouping by anything…once we GROUP BY, each group will some who play sports, and some who don’t. Taking the average of each group and multiplying by 100 will give us the percentage for that group.

All that said, the way that you are attempting to calculate this percentage will effectively count how many rows you have in each of your groups, as defined by your GROUP BY clause. Try adding the column COUNT(s.plays_sports) to your SELECT clause and you will see that it’s the same as your percentage column.

The reason for this result is that your subquery (namely: SELECT COUNT(*) FROM survey) will always return 100 (ie the number of rows in survey) which cancels out with your *100 in the numerator, leaving only COUNT(s.plays_sports) for your percentage_sports column.

As to why you’re getting 11 rows instead of 12, I’m going to guess it’s because of your WHERE clause that’s filtering your results unnecessarily before they are grouped. It looks like filtering by plays_sports=1 causes you to lose the group: parental_education=bachelor’s degree / lunch=standard because (I’m guessing) that group doesn’t have anyone who plays sports and therefore you lose this group entirely.

I think if you address these two issues, you should be able to pass the screen without any more problems.

EDIT_2:
Here is how I solved it:

SELECT pu.parental_education,
       pu.lunch,
       SUM(s.plays_sports)/COUNT(s.plays_sports)*100 AS percentage_sports
  FROM performance_updated AS pu
  JOIN survey s ON s.student_id=pu.student_id
 GROUP BY pu.parental_education, pu.lunch;
1 Like

Thanks. Agree on the calculation part

For the WHERE clause, the missions asks to calculate " percentage of students in each combination that participate in sports", that is why I included WHERE.

Edit: I think now I understand, even though mission asks to show students participating in sports, those who are not participating should not be excluded from the output table, but shown as “0” - that way the DQ answer makes sense

1 Like