Ebay Car Sales Data - Removing Price Outliers

Question to the Community

I would like to know what people think about my strategy to remove data where:

  1. price is 500,000 EUR or more
  2. price is 1 EUR or less but mileage is only 5,000 km

This is my first project share so bear with me, thank you!

I noticed that in this person’s analysis Link, the statement:

"Mercedes Benz vehicles are by far the most expensive out our top brands, on average costing three times more than the second most expensive brand, Audi."

In my analysis, Audi was only slightly more expensive than Mercedes Benz.

Because I actually speak German a quick glance at the really high priced car names indicated that those around half of the listings were actually “Wanted” postings.

My question to the community - how much time do I spend picking apart outliers?

To me it’s pretty obvious no cars on ebay are likely to cost 10 Million EUR and that most cars are below 500,000 EUR. So is it enough to cut out the most wild (ie. orders of magnitude off) and leave the rest?

I could spend half my time just looking at these outliers to confirm they should be discarded … but is it really that bad if I discard a few wild entries that are actually valid? I mean, even if they are valid, don’t they wreak havoc on my dataset anyway??

https://app.dataquest.io/m/294/guided-project%3A-exploring-ebay-car-sales-data/1/introduction

kwu_ebay.ipynb (73.1 KB)

Thanks you very much for any kind feedback … I am working on learning formatting and presentation - I realize they are important … but one step at a time!!



Click here to open the screen in a new tab.

Click here to view the jupyter notebook file in a new tab
1 Like

Hi @kwu
Welcome to the community and thank you for sharing on Analyzing Ebay Car Sales . Have gone through it and it has been well worked on. The codes are well presented and has thus rendered nice outputs, the explanations given are well detailed, the comments are well tackled … keep it up mate. Have got few humble suggestions;

  • It’s always recommended to include the aim of the project, like what are the questions you are trying to answer or to clarify , and you try as much as possible to have it short( that is the aim/goal/objective) . same to title, you can instead have it as Analyzing Ebay used cars.
  • I don’t think if most of the information given in the introduction are that necessary, Instead you ought to have included the information background of the data,the aim/objectives, and the links of the data set you have used, hope you will check into that.
  • Also consider re-running your project for sequential ordering of the code cells.

Otherwise congratulations! for having completed the project on Ebay Car Sales.

Happy learning.

Greetings @brayanopiyo18 :grinning:
Many thanks for taking the time to review my project and for providing constructive feedback and tips! I agree with all your suggestions, especially that my introduction could be greatly improved. For future projects I think if I treat my audience as unfamiliar with the guided project that would a good start.
Cheers,
kwu

1 Like

Appreciated @kwu for considering my suggestions helpful.

Happy codding!