Hi, community; this is the first project that I have shared here. It’s also my first time using NumPy and Pandas for data analysis. I’m new to the DS path, but I love it. It took me a while since I hadn’t learned how to create graphs and the scientific method to drop outliers, but I google it, so it was fun!
The project is the eBay car Sales: Learn data science with Python and R projects
Here are a couple of resources that help me:
Basics.ipynb (327.2 KB)
I’m really open to feedback, so please let me know if you have any.
Click here to view the jupyter notebook file in a new tab
Congratulations on your first project. It’s looks good.
Thank you so much @rhemamichael9 !
Hi @as4! Thanks for sharing your project with the Community!
I’m really pleased that you had fun with it It’s one of the best parts of data analysis (and data cleaning). I’m doing my PhD at the moment and I have to Google so many things every day, that I am thinking of starting a Jupyter notebook where I’ll write all the code I google frequently. And thanks for sharing these resources with us.
I am also happy to see that you tried out
matplotlib although it was not in the guidelines! That shows your great initiative.
A few suggestions and observations from my side:
- Be more specific about what you want to achieve from this project. You may not know it at the beginning (but only after the exploratory analysis) but make sure you say that after you’ve finished the project so that we have a clear view of what goal you are trying to reach. These clear goals will also enable you to remove the irrelevant columns right at the beginning and not spend time cleaning them
- That’s nice that you’ve done some background research on the average odometer value. I think it’s a good proxy to evaluate the age of a typical used car
- Import all of your libraries in the first code cell so that we have an idea of what packages to install before reproducing your analysis
- You can greatly improve your figures if you add axes labels, and a title, remove the top and right spines and play a little bit with colors to make them more natural. I have an article on how to improve plots with a few adjustments. If you are interested, I can also link you to other resources
- I am not sure what you are trying to achieve with a scatter plot. It’s usually used to demonstrate a relationship between two variables… Its subtypes like strip plots can be used to compare distributions (for example, distribution of prices by car brand)
- I believe that dropping every car priced more than $16 200 is too conservative. Take a look at how much the most expensive brands cost. They may easily exceed $20 000 even if they are used (ex. Mercedes). The same applies to retro-cars (which may cost even more than originally)
- Focus on the narrative in MarkDown and leave the technical part to code comments
- Expand your conclusions. You’ve discovered more that you write there
I look forward to seeing the future steps you spoiled at the end
Thank you so much, @artur.sannikov96 ! Your feedback is so valuable, and it means a lot! I definitely will improve those details that you pointed out. I already screened your article and will use it to improve the project and upload it as an update in the post.
About the Jupyter notebook for your notes: it’s an amazing idea. Another wonderful tool for taking granular notes is Obsidian. It allows you to create notes like your brain works, not in one long document but instead, relating granular notes to create something like a second brain where you can easily find topics you wrote while learning about something. Here is a screenshot of mine. I have been using it for a couple of months now, and it’s amazing.
Thank you so much again for your feedback.
Great project @as4, it was very intentional and explanatory
Hi @as4, let me know when you updated the project, and I’ll have a look at it
And thanks for the suggestion for Obsidian. It sounds interesting, I’ll see if I can use it for my notes.