eBay Car Data Exploration

It’s my first time using NumPy and Pandas for data analysis and I really enjoyed the experience that came with this project. It was a bit challenging at first especially when I reached the aggregates part for prices by brand and mileage by brand, but I managed to figure my own way of understanding it.

Below is my work, please comment and advise.
Thank you.
car_exploration_analysis.ipynb (135.1 KB)
My Github link: Car Sales Exploration (Github)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @o.abucheri,

Nice job completing the guided project especially considering the slight challenge you faced when aggregating data. It’s also nice that you enjoyed the experience.

My thoughts:

  1. You’ve done well briefly describing the data set you used. Also consider adding a link to the data set. I think it’s not on Kaggle anymore but it was moved here.
  2. Some typos need cleaning up. It’s a minor thing but every small thing counts when sharing your work with others i.e. people can be a bit nit-picky at times and typos can adversely (and implicitly) affect people’s perceptions even if the project is good overall.
  3. You can also combine code cells [1], [2], [3] into one. It makes the narrative a bit less fractured. The steps are fairly simple and can be explained with one paragraph.
  4. Similar to typos, consider cleaning up some of the unused and commented out codes.
  5. When you realised that you accidentally added a white space to registration_year , you can just modify the code in [8] directly by removing the white space and then rerun the notebook. It’s not necessary to do the fix later; the readers don’t know that you made a mistake and they can only see what you presented to them. (Keep it a secret :smiley: ).
  6. " Each column has a count of 50000 records and colums such as seller and offer_type have almost similar records." → by count, do you mean they all have 50000 rows or that they have 50000 non-null values? The describe table only shows the number of non-null values thus not all columns have a count of 50000.
  7. " seller and offer_type have almost similar records." → I think the word “records” here can be a bit ambiguous because I assumed you meant “rows”.
  8. " The num_photos column looks very funny and needs some further looking into." → maybe expand a bit on what you mean by “very funny”. One reason why you think the column is funny should be good enough e.g. all NaNs and 0s.
  9. Some of the text might be better suited as code comments e.g. " When removing outliers, we can do df[(df["col"] >= x ) & (df["col"] <= y )], but it’s more readable to use df[df["col"].between(x,y)]"
  10. It’s quite odd that aggregation is written as aggregation. I’m not sure if it’s necessary to use the code style in this case.
  11. Add a conclusion to briefly summarize all your findings.

One clear pattern I see from reading your notebook is you’re very thoughtful and analytical when you explained each finding , thus making the notebook quite an enjoyable read.

Thank you for sharing your project and keep up the good work. Cheers.

1 Like

Thank you. I’ve made the changes.

2 Likes

No worries @o.abucheri.