I’ve no idea why sqlfiddle is having all sorts of
TypeError: Cannot read property 'substring' of undefined when query runs fine on https://sqliteonline.com/ so i’ll just paste the scripts for reproducibility. Database used is sqlite, the default on that website.
Creating sample data
DROP TABLE IF EXISTS twitter; CREATE TABLE IF NOT EXISTS twitter (session_id INT,session_type VARCHAR, user_id INT,duration INT); INSERT INTO twitter VALUES (1,'A',10,101), (2,'A',10,102), (3,'A',11,103), (4,'A',11,104), (5,'A',11,105), (6,'B',10,201), (7,'B',10,202), (8,'B',11,203), (9,'B',11,204), (10,'B',11,205)
Questions on DISTINCT
SELECT distinct session_type, min(duration)over(partition by session_type) FROM twitter --SELECT session_type, min(duration) --FROM twitter --GROUP BY session_type
|session_type||min(duration)over(partition by session_type|
DISTINCTbe used as a
GROUP BYtool? Both the active and commented queries are returning the same results (ignoring col names). My first exposure of this idea from https://stackoverflow.com/a/40594647/8621823
How does the window function correctly align the results to the distinct session_types?
When googling “multiple columns with distinct”, results show examples of select distinct A,B., but i’m looking for select distinct A, some_other_operation in a single select. I’ve never seen other columns being added to SELECT distinct . What is the alignment mechanism here?
Questions on GROUP BY
I want to rank the users based on total duration for each user for each session_type.
Note: I may have wrote the query wrongly even though it executes because i don’t know how window functions interact with GROUP BY
select session_type,user_id, sum(duration),rank()over(partition by session_type order by sum(duration)) AS rank from twitter GROUP BY session_type,user_id
- The window function is not the usual aggregate function (eg. COUNT, AVG) that i see (or is it actually an aggregate?), and not in one of the GROUP BY columns. Would this have errored (
must appear in the GROUP BY clause or be used in an aggregate function) in a stricter database engine than sqlite or is a perfectly good query?
EDIT: My own answer after tests on postgres is it doesn’t!
How did the window function interact with the GROUP BY to return correct ranks? (I don’t want to wrongly learn something that works only because of the lax rules of sqlite). The GROUP BY was grouping on 2 columns while the window was only looking at 1 column
sum(duration), something feels wrong.
Was the window function independent of the GROUP BY’s operations? Even changing
durationin the partition by returns the same ranks without erroring.
sum errors on postgres with
42803 column "twitter.duration" must appear in the GROUP BY clause or be used in an aggregate function. What is going on? Does this imply window functions are not aggregate functions? Or just not the specific aggregate functions that GROUP BY wants?
- What is the correct way of doing the task? (if i still prefer to use window functions rather than subqueries?)
If anyone has solid ways of explaining, like debugging through query plans, even better