BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

Guided Project: Customers and Products Analysis Using SQL - Screen 4

To find out the products that are almost out of stock, the guidance says: “The low stock represents the quantity of each product sold divided by the quantity of product in stock. We can consider the ten lowest rates. These will be the top ten products that are (almost) out-of-stock.”
SQL Fundamentals Guided Project | Dataquest
But to my understanding, the ten highest rates should be considered, not the lowest. When the quantity in stock is small and the quantity ordered is large, the division of SUM(quantityordered) by quantityinstock should result in large number and indicates that the product’s selling is good.

The instruction left me so confused. Anyone find same issue? And if you took the highest quotients, did it affect the rest of the answer to Question 1?

I agree with you on this. The formula did not seem to make sense. I also tried Googling for a low stock formula that would make more sense but could not find one.

This is one of the sites I came across that highlights some of the formulas associated to inventory
management but I could not find anything close to the formula given here.

Hopefully DQ could shed some light on this.

I automatically read it like

low stock = stock / SUM(ordered).

However, when I limit to 10 items and also limit the best performing items to 10 there is no overlap between those.

Did you succeed in solving this question in the meantime?

Hey Leon, I had similar issue when trying the 10 item limit. My explanation for that is: the products lowest in stock might not be those bringing in highest revenue, especially when you set the limit to 10.

To solve it, I actually tried 2 different approaches which yielded 2 tables, one with 10 products and the other with only 5 (and these 5 fitted into the table with 10), and both told that Classic Cars are to be the focus.

Approach 1: consider products low in stock first, then those in which that have highest revenues

WITH low_stock_products AS (
SELECT productCode,
       ROUND(SUM(quantityOrdered*1.0)/(SELECT quantityInStock
                                         FROM products p
                                        WHERE p.productCode = o.productCode),2) AS low_stock
  FROM orderdetails o
 GROUP BY productCode
HAVING low_stock >= 0.5
/*
Why 0.5? Because when the quantity sold accounts for (more than) half of the total quantity in stock,
a store should pay attention to the product in question and already plan for restocking.
*/
),
best_performance_products AS (
SELECT productCode, SUM(quantityOrdered*priceEach) as product_performance
  FROM orderdetails
 WHERE productCode IN (SELECT productCode
						 FROM low_stock_products)
 GROUP BY productCode
 ORDER BY product_performance DESC
 LIMIT 10
)
SELECT productCode, productName, productLine
  FROM products
 WHERE productCode IN (SELECT productCode
						 FROM best_performance_products);

Approach 2: sort out products low in stock and products with best performance, then compare the two lists to sort out the common values

WITH low_stock_products AS (
SELECT productCode,
       ROUND(SUM(quantityOrdered*1.0)/(SELECT quantityInStock
					                     FROM products p
                                        WHERE p.productCode = o.productCode),2) AS low_stock
  FROM orderdetails o
 GROUP BY productCode
HAVING low_stock >= 0.5
),
best_performance_products AS (
SELECT productCode, SUM(quantityOrdered*priceEach) as product_performance
  FROM orderdetails
 GROUP BY productCode
 ORDER BY product_performance DESC
 LIMIT 20
),
sorted_table AS (
SELECT lsp.productCode
  FROM low_stock_products lsp
  JOIN best_performance_products bpp
    ON lsp.productCode = bpp.productCode
)
SELECT productCode, productName, productLine
  FROM products 
 WHERE productCode IN (SELECT productCode
                         FROM sorted_table);

It’s very different from the given instructions but beside these I had tried 2 other sets of queries, and the ones above gave me confidence so I went for that. Let me know what you think about my approaches please.

I think the formula instructed does make sense but we have to add different condition or limit to the result (instead of 10 as instructed), and I still think going for the lowest quotients is counter-logic.

Did you successfully find more reliable formula(s) for the low stock anyway?

@thuy-thuy I was talking about the formula being wrong with respect to the desired output. If we wanted the items with the lowest stock, as you have suggested, I would have agreed with the ten highest rates and not the ten lowest ones.

E.g
Item: Chips
Ordered: 15
In-stock: 3
Low Stock = 15/3= 5

Item: Coca-Cola can
Ordered:5
Instock: 20
Low Stock = 5/20= 0.25

Clearly we should be ordering more Chips than Coca-Cola.
So either this is the wrong formula given the condition is to select the lowest low-stock rates or the formula is right and the condition is wrong.

I think your second approach is best. However I think the below query requires a second look (I could be wrong. Please do correct me if I am wrong, still learning :upside_down_face:)

WITH low_stock_products AS (
SELECT productCode,
       ROUND(SUM(quantityOrdered*1.0)/(SELECT quantityInStock
					                     FROM products p
                                        WHERE p.productCode = o.productCode),2) AS low_stock
  FROM orderdetails o
 GROUP BY productCode
HAVING low_stock >= 0.5
),

If you have to give HAVING low_stock >= 0.5, I believe you have to change the formula to
Low Stock = In Stock / Ordered. With the current way its written I do not think you would ever get 0.5. Consider the example below:

Item: Snickers
Ordered:30
Instock: 15
Low Stock = 30/15= 2

Based on what you were after i.e.

the Ordered value is definitely more than half but based on your formula we would get a ratio of 2 instead of 0.5.

@jesmaxavier I maintain that my formula order and the limit 0.5 are valid. Try running the query with reversed limit (i.e.: HAVING low_stock < 0.5) you will yield a long list of products whose quantityOrdered accounts for less than half of the quantityInStock (more than 50 products as I vaguely remember).

However, I wouldn’t say that it has to be 0.5, it could be 0.75, or 1, or even 2 (in this case the table yielded will include only a couple of items), but >= is a must. In case you try the other way around, going for quantityInStock/SUM(quantityOrdered), it should work too, but then the statement should be reversed too, to HAVING low_stock < X.

The example you gave was exactly that of low stock product, and if, say, it were among the products of the store, it would have showed up in the result table of my query. The table my query yielded included one item with SUM(quantityOrdered)/quantityInStock quotient at 67 something, but it was an outlier, most of the other products’ were somewhere between 0.5 and 2.

Hi @thuy-thuy, @jesmaxavier,

first of all, I still think the formula for finding items that are low on stock should be stock / ordered where items with a low ratio are low on stock.
Intuitively this feels much better than items with high ratios given ordered / stock.
Nevertheless, I was able to get the same results as given on page 8 by running the following code (using the formula ordered / stock and looking for low ratio’s which I do not agree with. :slight_smile:

WITH
pr_perf AS (
SELECT
	productCode,
	CAST(SUM(quantityOrdered) AS FLOAT) AS ordered,
	ROUND(SUM(quantityOrdered*priceEach)) AS performance
FROM orderdetails
GROUP BY productCode
),

low AS (
SELECT 
	products.productCode,
	products.productName,
	products.productLine,
	products.quantityInStock AS stock,
	ROUND(SUM(pr_perf.ordered / products.quantityInStock),2) AS low_stock
FROM products
JOIN pr_perf ON products.productCode = pr_perf.productCode
GROUP BY products.productCode
ORDER BY low_stock
LIMIT 10
)

SELECT 
	low.productName, 
	low.productCode,
	low.productLine, 
	low.low_stock, 
	pr_perf.ordered, 
	low.stock, 
	pr_perf.performance
FROM low
JOIN pr_perf ON low.productCode = pr_perf.productCode
ORDER BY pr_perf.performance DESC

This code:

  1. Calculates the performance for all items.
  2. Searches for 10 items with lowest ratio of ordered / stock.
  3. Then it joins the 10 items with low_stock ratio with the performance.
  4. Sorts on performance.

regarding your code @thuy-thuy, it think it calculates what you want correctly.
However, you’re still using subqueries within your WITH statement. Should’nt this be avoided?

Hi Leon, I’m not against your thinking at all, to me both stock/ordered and ordered/stock make sense, the only thing that won’t make sense is pick out highest ratios for the former, or lowest for the latter (as per DQ instructions).

I’ve tried queries with stock/ordered formula and they resulted in the same table I got from my above queries, on 2 conditions for low_stock: ORDER BY low_stock DESC and LIMIT 20 (instead of 10).

There’s an issue in the instructions, also in the solution suggested at the end of the project (I found it in your above queries too): to focus on 10 products, the join query limits the number of outputs on the low_stock basis to 10. What doesn’t make sense is, if low_stock outputs are already limited to 10, you will always get performance of those 10 products whether or not the performance is high (meaning, it could be low or average, but it’s still included because its productCode appears in the limited low_stock table). Therefore, there’s no point in looking at the performance anymore. (I’m not sure I’m making my point clear or not, please let me know if you get me here, I’m still learning effective expression and explanation.)

Regarding the subqueries in my code, I don’t recall that subqueries should be avoided within WITH clause, but indeed subqueries should be avoided in general, except when it’s really necessary. I could’ve rewritten my queries to break that down, but since mine took less than 30ms to run, I didn’t bother to do so. :sweat_smile: