In the database, there are different tables. Each table has its own set of data.
We ask ourselves some question about “class size survey demographics”. In order to answer these question, you need to figure out what tables are required.
Working backwards, we figured out the main data table required, that is combined
.
The final data needs additional data from
-
class_size
,
-
demographics
,
survey
hs_directory
Using LEFT JOIN
because we want everything from combined
data set. (Or RIGHT JOIN
is also possible but you need to change the syntax).
Description below does not have the correct syntax but are used to explain the logic behind the merges.
A = (combined LEFT JOIN class_size
)
Using LEFT JOIN
because we want everything from combined
table that may contains matched row from class_size
or non-matched rows as NULL
.
B = ( A LEFT JOIN demographics
)
Keep building from A
. Similar logic, because we want everything from previously build table A
that may contains matched row from demographics
or non-matched rows as NULL
.
C = (B LEFT JOIN survey
)
Keep building from B
. Similar logic, because we want everything from previously build table B
that may contains matched row from survey
or non-matched rows as NULL
.
D = (C LEFT JOIN hs_directory
)
Keep building from C
. Similar logic, because we want everything from previously build table C
that may contains matched row from hs_directory
or non-matched rows as NULL
.
Why order matters?
The first merge matters the most because we determined combined
table to build our data from. That is (combined
LEFT JOIN
other_table
). other_table
can be any table, but need to satisfy the JOIN ON
condition where both tables performing the join has similar column(s).
The first merge matters the most because we determined combined
table to build our data from.
The final results contains exact the number rows in combined
table.