I tried to understand the error information but it barely pointed where the problem could be.
I believe in the last sql project under jupyter notebook, it is a lot easier to understand error information.
Any suggestion? Or it is because our console didn’t display the detailed information?
Another question is about how to organize what we have learned so far and to learn more effectively.
I felt I just learned some new syntax or new tools and then about one week later, I forget everything. I know it is good to google and read each documentation. Or leave some comments while coding to make sure I remember some mistakes I made. I have already left so many detailed comments in the console page, it took even some time to go back and find the exact page where I was stuck with the same problem before.
Do you think it is a good idea to keep some key notes in the paper notebook? As a new programmer, I want to have a good habit, stay focused while organize what I have learned so far. I heard that DQ will provide some cheat sheets at the very end of the courses. I am new to programming.
Programming is like learning new language. If I don’t practice it everyday, I will easily forget a lot.
Line one of error message: near “mt”: syntax error.
The first line should give you ample hints about the error. You have a syntax error near mt. mt is first used on line 5. Checking the syntax, you observe there is a missing comma on line 4.
Understanding Order of Execution is useful in having a thinking framework when writing queries, i believe it would help during debugging too. If you knew FROM comes before SELECT, you will then know that database management software like Dbeaver can help you autocomplete the columns in the SELECT clause if you specified which table you are querying FROM first, but this autocomplete will not appear if you typed SELECT first without FROM. https://sqlbolt.com/lesson/select_queries_order_of_execution
For practice, try to think frequently about business/analytical questions and how to implement them in sql, it is important to practice filtering values at the correct stage of the sql pipeline, lest you filter too early and undercount or too late and overcount rows. You can even take all the code in pandas and do it in SQL. (That’s how you can learn df.groupby excludes NaN groups while SQL GROUP BY includes NaN groups).
Eventually learning to analyze purely in SQL has to happen as pandas with RAM is not big enough (for storing merged tables) and speed of merge in pandas is much slower (eg. 15 minutes vs 30 secs)
Thank you for your advice. I really enjoy reading your code in the FAQ section. I found lots of
programming tips from a skilled programmer simply by looking at his code.
From those lines of code of yours, I learned 3 things:
SELECT f.name country, SUM(c.population) urban_pop, f.population total_pop, SUM(1.0*c.population)/f.population urban_pct
FROM facts f INNER JOIN cities c
ON f.id == c.facts_id
GROUP BY country
HAVING urban_pct > 0.5
ORDER BY 4
First, 1.0*c.population is an easier way to convert to float. I do not like to write CAST(), especially together with count or sum function, I will end up missing some silly () somewhere.
Second, if a/b, I just need to convert either a or b, just one variable to float, not both.
Third, group by field name, not field number is an indeed better practice.
The only thing I am not sure of is:
Do I need to write f.id == c.facts_idorf.id = c.facts_id
I know == is not the same as = in python. But I am not sure if they are equal in sql.
I guess I will have to write on a scratch paper of a flowchart or pseudo-code for those intermediate or complicated sql task. It is more like a brain teaser for me know.