I'm confused at this stage -- am I wrong? 179-2 Understanding Inner Joins

1: What does this paragraph… mean?

“While our query from the previous screen included both columns from the ON clause, we don’t need to use either column from our ON clause in our final list of columns. This is useful as it means we can show only the information we’re interested in, rather than having to include the two join columns every time.”

There aren’t two join columns in the previous step? What does ‘the final list of columns’ refer to? The answer is, aside from aliases, exactly the same – so I don’t see the point of it.

2: I wasn’t able to do this on my own tbh, I had to refer to the hint – I don’t think confusing aliases are really a challenge here, surely just specifying ’ The name column from facts’ as challenge enough? In none of the explanation was there a suggestion that you’re supposed to put all the data you want in the first select, even if it comes from the second dataset – surely that’s new/novel enough?

Edit:
3: After realising that the link to community discussion doesn’t automatically tag the answer I had to come back to edit… seems a bit weird that one can’t auto tag this no?

1 Like

For reference, here’s my answer vs. DQ’s suggested answer. I’m likely confused by using f and c more than anything, but I wish the answer iterated on the previous answer and not introing new concepts + unclear tautology.

2 Likes

hey @internetmessages1

I won’t be able to answer to your 3rd question, I didn’t quite get it :frowning:

For question 1: It’s not the number of joins! it’s two columns which have been used To JOIN the two tables, so the number of columns is 2.

What this statement tells us, is that we can refine the result to show what is relevant to be seen, and hideaway or not include the operational aspects of it. Take it in terms of a blog post, we show the result of our analysis, however, we may not show actual code behind it.

So if we want to join two tables and bring a country and it’s capital city together, where we join using an ID column (with numeric data) which one do you suppose looks better?

SELECT 
     a.country As Country,
     b.city AS City, 
     a.city_id AS Table_A_CityID,
     b.city_id AS Table_B_CityID
FROM table_a  a
INNER JOIN table_b b ON a.city_id = b.city_id

OR

SELECT 
     a.country As Country,
     b.city AS City, 
FROM table_a  a
INNER JOIN table_b b ON a.city_id = b.city_id
2 Likes

For Question 2:

The instructions were very clear and precise as to what needs to be done. To the extent of telling us that we need to include all the columns from cities table.

Coming to the aliases, as of now it’s two tables and 2 or 5 columns.

But in real-world, we have like a single table with 300 columns, to be joined with 5 different tables (this is still a very low number) … each table having its own 100+ column structure. How would we handle that?

It still doesn’t justify to name a table like a or b if their names start with a or b.
However, it makes sense to shorten the name to something readable when database/ table / and project is being shared with an entire team. Consider this example:

table name = SharedServices_ConnectionPool_Configuration_Settings_Version_xxxx. :confused: :thinking:

1 Like

I think this is just introducing complexity for the sake of it because that second part which I don’t understand (a.city_id AS Table_A_CityID,) isn’t introduced in the first place. I still have no idea what this means.

I disagree, it’s not clear that one should be using a single select function to specify the columns – I would have thought that would have been a separate select query that comes after the inner join, because why would I naturally select a column from a new dataset. That, and joining table name and column name with a . is also totally new concept.

And so having an answer with single letter names for databases is also a case of the course designer spending way too much time working on this, and not enough QA from total beginners like me imo. Which is why I’m going to keep complaining in the forum when I don’t understand something :smiley:

1 Like

hey @internetmessages1

Of course, you may raise your doubts/complains with the DQ community anytime!

It seems we are on different tracks here. I am not sure which course track you are following currently.

I am also learning just like you. For me, SQL is part of step 4 in DataScience path, and the column aliases were taken up in mission 253 - Summary Statistics.

regarding table alias, I guess that might be a miss in the content. Checking up with the tech/ content team. till then you may cover up this mission if it’s accessible for you.

1 Like

Hi @internetmessages1,

What do you mean by ‘automatically tag the answer’? I would be happy to look into this issue.

Best,
Sahil

I agree with internetmessages1 from the standpoint that the instruction leading into the exercise, was extremely confusing. I don’t think what the exercise wants us to do is confusing but the instruction isn’t very helpful. Namely this passage without an example.

“While our query from the previous screen included both columns from the ON clause, we don’t need to use either column from our ON clause in our final list of columns. This is useful as it means we can show only the information we’re interested in, rather than having to include the two join columns every time.”

I think it is even more confusing to reference the use of a wildcard WITH an alias yet not provide an example of what that looks like. I don’t think what the exercise wants us to do should be THIS difficult to comprehend. I think the wording of the instruction was poor with a lack of examples to help students (overall I’m enjoying the DQ experience for the record).

3 Likes

I also think the learning part lacks some information . At first, my answer was like that:

SELECT * FROM cities c, f.name AS country_name FROM facts f
INNER JOIN c ON c.facts_id = f.id
LIMIT 5;

I can’t understand the errors for SQL for now, so I checked the DQ answer:

SELECT c.*, f.name country_name FROM facts f
INNER JOIN cities c ON c.facts_id = f.id
LIMIT 5;

I assume the order of the execution for SQL is first INNER JOIN, then FROM, then SELECT because we select columns from the concatenated table (all rows from cities)… What I thought was: I should select columns from two tables, then join them. It wasn’t clear for me…

1 Like

hi @ertekin.muge

I started working on your query but then thought maybe it’s too basic and won’t help you much. So apologies for this delayed response.

We actually need a detailed discussion, but before we go there I just wanted to know if you are still confused about the joins or SQL syntax in general.

One specific problem with your code is having two FROM clauses in the same query.

SELECT * FROM cities c, f.name AS country_name FROM facts f…

The execution order is also not like, first the SELECT then JOIN then FROM, etc. Again this would need a detailed explanation.

So let me know if there is specific doubt you have, or you want a detailed explanation in general, or you if you have practiced enough to supplement your understanding.

Thanks.

I am also extremely confused. How we use an alias before we declared it? Am I missing something crucial?

3 Likes

hey @kourtinakias

Welcome to DataQuest Community!

Found these useful article which explains the execution in simple terms.

But then again SQL is Structured Query Language. By writing an SQL statement on a SQL database server, we tell the server what we need. We can’t actually tell how we need it. This stack overflow post should help you with this again as an overview.

2 Likes

Thanks for your help, greatly appreciated.

I am confused as to what the significance of the period (.) after c, “c.*”? Feel like I really could figure out everything out until got to this particular exercise.

Hi @hhongqin,

I was just wondering the same thing.
Consider this answer from Step 2:

SELECT f.name AS country, c.name AS capital_city FROM cities AS c
INNER JOIN facts AS f ON f.id = c.facts_id
WHERE c.capital = 1;

Think of it like this: when we were just using the facts table in the first mission it was the only table involved so any column we specify obviously refers to a column in facts.

Now we are doing the join and there are two tables involved in this one statement (facts and cities). We need to tell SQL which table we are referring to. Whenever we are referring to column names of either table, we need to use the syntax: table name.column name. So: facts.name, facts.population, cities.facts_id, cities.name, cities.capital, etc.

In this case we have “aliased” the table name for both facts and cities to f and c respectively. so referring to the columns would look like f.name, f.population, c.facts_id, c.name, c.capital.

2 Likes

Thank you, you explained this in a way simpler / more succint way then the lesson did. I agree with the earlier poster, this lesson was not delivered in a digestible way (which is not the norm for DQ).

3 Likes