Guided Project: Using SQL to Answer Business Questions (Chinook Music Store)

Hello all,

I am sharing my solution to the Chinook music store guided project.

I used mostly created views as opposed to subqueries to answer business questions. This allowed me to show previews of created views as I was building up follow up queries. I suspect that subqueries are preferred in a workplace setting unless they need to be reused, then created views are preferred. Is this the case?

When it comes to comparing full album purchases vs individual track purchases, I decided to show each type of purchase by the sum of tracks sold as opposed to the count of invoices. In order words we can tell Chinook:

  • “33% of tracks sold were part of a full album purchase”
    as opposed to
  • “18% of invoices were full album purchases”

I found the following question from screen #7 slightly confusing:

  • Is the range of tracks in the store reflective of their sales popularity?

I compared the share of total tracks by genre to the share of dollar sales by genre.

GitHub link

Last mission screen

answering business questions with SQL.ipynb (82.9 KB)

Click here to view the jupyter notebook file in a new tab

2 Likes

Hi Ivan,

Thanks for sharing your project! It looks really awsome: perfectly structured, properly styled SQL queries, thorough analysis, curious insights, interesting approaches, including comparing full album vs individual track purchases and comparing tracks available and sales popularity (even though I agree that this question was a bit confusing). I liked also your navigation and conclusion sections, and it was a cool idea to use the JULIANDAY() function. Also, this observation looks curious:

I would ask internally why Punk is paired with Alternative in the genre title. There is already an “Alternative” genre. This can possibly cause new “Alternative” tracks to be mistakenly categorized as “Alternative & Punk”. This can lead to bad data and cause customer dissatisfaction.

Now some ideas from my side:

  • About creating views, I also have an impression that it’s more common to use subqueries instead. It seems like with defining functions in Python: unless you’re going to use a certain code several times, there is no need to create a function. However, I agree with you that making a preview is a good idea (like in your “Purchases by Country” chapter). Well, on the other hand, you can preview this code also without creating a view, and then use the same code with WITH command in the next code cell. Anyway, you can combine the code cells [14] and [15] into one view, since you’re not going to use this invoice_usa more than one time (exactly for creating the next view).
  • The code cells [3] - [13]: probably it’s better to limit the preview of the available tables to 2-3 of them, also because the contents of many of them are quite self-explanatory from the database schema.
  • In some code cells (17, 23, 25, 27-29, 31-33, 35) there are too big white spaces between the commands and the statements themselves.
  • After the code cell [28] I would add some observation about how we can use (or whether we should use) this information further.
  • The code cell [30]: probably, I missed something, but why are we using here WHERE track_id > 2012 ? What does this 2012 mean?
  • The markdown cell after [24] and another one after [32]. Please remove the bullet points from the lists if you 're going to use numbered lists instead.

I hope my suggestions were of help to you.
Great job your project!

1 Like

Elena,

Thanks for the thoughtful feedback. As a result I made some tweaks in my code on github.

You didn’t miss anything as 2012 is just a random filter value that allows me to show both unique values created by the CASE statement in the purchased_yes_no field. Sorry for the confusion.

1 Like