SELECT ((total - AVG(total)) * (total - AVG(total))) FROM invoice;
What I expected to happen:
I’m trying to calculate the variance of invoice data from the SQL Business Analysis Project. There was a line from the slide: “You might like to consider whether any extra columns from the employee table explain any variance you see, or whether the variance might instead be indicative of employee performance.” I realize now that “variance” wasn’t referring a mathematical definition of variance, but rather why some employees performed better than others. However, I’m now really curious as to how I would calculate variance in SQL, i.e. sum (xi - mean)^2 / (n-1), and how you would calculate it for grouped data. In the code block above, I calculated (xi - mean)^2 and would expect to see a table with the same number of rows as the original invoice table.
What actually happened:
((total - AVG(total)) * (total - AVG(total))) 27.03915310109401
However, as you can see, SQL only calculated (xi - mean) * (xi - mean) for the first entry.
Another note, there looks like there is a built-in variance function VAR in SQL, but it’s not available on the version of SQL on the Dataquest platform.
Any guidance on this would be helpful, thank you!