Exploring eBay Car Sales Data | Any Feedback, Advice welcome

I’ve redone my whole ebay project to remind myself some pandas spells and tricks. Sharing it with hopes for some feedback. This version uses plots and geopandas, but it also has a few observations which can be made using basic pandas:

  1. what is sontiage_autos? what are we going to do with it?
  2. Whats happening with data after 2015?
  3. power, registration_year and price columns, lets LOOK at them after we clean them, maybe we can still do something?
  4. fill in the power column with… what values? average? like fiat punto has on average same amount of power as new porsche?


Hello @adam.kubalica! Thanks for sharing your project with the Community:)

I think you have a nicely structured project with nice insights. You are also the first one I’ve ever reviewed to notice that the data after 2015 is faulty which is great. Also good use of geopandas and generally a nice approach to challenging yourself. You’ve done a great job!

A few suggestions from my side:

  • Title your project
  • I think, it’s better to get rid of the index. It does not add that much information and the project starts to seem as a doctoral thesis
  • Remove too obvious comments like “#list the existing column names:” in cell [853]
  • Rerun the whole project so the cells have numbers from 1
  • In the section “Now lets loop over every brand:” you miss a code cell
  • Title your plots, label the axes and remove the top and the right spines. It will greatly improve the readability. You can do even more if you will read some articles on how to improve your plots (but I recommend a book called “Storytelling with data”
  • You should limit the number of brands on the pie chart to 5-6 and put all other brands in the “Other brands” category. It will improve the readability
  • You should provide a link to the map you used in the project (the .shp file)
  • Write conclusions! Many people who’ll read your project only care about those 10-15 lines that sum up the whole project. What are the main insights from the data set?

That’s it for me. Happy coding :smile:

Thank you @artur.sannikov96 for the feedback, I’ve spent the morning doing the touch ups you’ve pointed out, BUT:
can you elaborate on getting rid of index? Majority of serious notebooks I’ve seen had an index. also seen a few articles for beginners advising to use index in the notebooks, is this personal preference? experience? inside knowledge? curious about it, because that’s the only point where I’m leaning towards disagreement with the remark

Hello! I saw very few notebooks with indexes and those were very big, major research projects.
I think, it’s my personal preference and a bit of experience. To me seeing a project with an index is a bit intimidating and seems like it’s something bigger and more complex than it really easy.

Sometimes, I don’t see an index even in a doctoral thesis where it may be more appropriate and I almost never see them in a scientific paper and those are usually much more complex (structurally) than a data science project.

Anyway, it’s not a major problem in your project:)

