Independent Kaggle Project - Grocery Store Dataset Analysis: Table of correlations between pair of items and a function that generates a table of correlation for any chosen item

Hi all!

I just finished lessons 1.1 to 1.7 in the DQ course and thought it would be interesting if I were to run an independent project on a public dataset, of which I chose Kaggle to be the lucky provider, using techniques that were adjusted from those that I learned during those lessons in order to answer new types of questions that I’d presume I’d encounter further down the Data Science journey.

The Jupyter Notebook showcasing my project is attached: Groceries dataset - Freeflow practice.ipynb (50.0 KB)

I’m of course open to any constructive feedback, so don’t be shy!

The dataset contains store purchase orders, where each order is a row. The source can be found at Kaggle, at this link specifically: Groceries dataset | Kaggle

The 1st question that I decided to tackle was which item types were the most likely to appear for their respective correlative item types, aka item type x is more correlative to item type y than any other item type pairing in the whole order dataset.
This is because it might be useful for a store manager to know what item is most likely to go hand in hand with another item.
I generated a correlative table for this, of which of course I showed my method to generate it.

The 2nd question was which item types were the most correlative for a chosen item type, so if one wanted to know which type of items were most likely to appear for let’s say meat, can a table showing the most likely item types at its top be generated?
A use case that I thought of for this question’s usefulness was promotions, as perhaps a store manager would want to know what is the best pairing item type to also promote when promoting a discount for another item type.
I defined a correlative-table-generating function and provided an example of it in use, with tropical fruits as the item type to have top correlations be drawn and easily shown for it.

That about sums it up, hope you all enjoy it and have a great day or evening!

Kind Regards,

Kevin R


Hi Kevin!

Thank you for sharing your project! I really applaud your initiative to right away put into practice the things you learned :clap: :+1:
I find that your project is interesting and the questions that your are investigating are practical, relevant and useful. For future projects, I would suggest that you use comments in your code, as sometimes it is not straightforward what you are trying to do. I found it a little bit hard to follow your steps, specially when it came to the part with the nested for-loops. I know you described your process in the text, but it is still quite helpful to add some comments in the code itself :slightly_smiling_face:

I wish you much success in your data journey!


Hey Erika!
Thanks for the feedback, I do agree that it can be not so straightforward, in fact coming up with some parts of this code did give me headaches for more hours than I’d like to admit haha, so I’ll definitely think for my later projects on how to implement code comments for easier understanding of the code :slight_smile:

Also, same to you, with you the best in your data journey!

1 Like