Exploring Ebay Car Sales Data - why 1900 is lowest acceptable value for registration_year

This is the DQ content in this project - “Because a car can’t be first registered after the listing was seen, any vehicle with a registration year above 2016 is definitely inaccurate. Determining the earliest valid year is more difficult. Realistically, it could be somewhere in the first few decades of the 1900s”

I don’t quite get this reasoning being suggested for data cleaning. I am reading this as - the car may have been first registered in early 1900s. And its ad was first crawled in year 2016 ( because date_crawled are all in 2016 ). That means, it’s more than 100 year old car being listed for sale. Can someone who went thru this guided project help me understand? Thank you!

Yes, that’s what they are trying to imply. That there could be a car that was registered in the early 1900s because that’s when mass production of cars really took off. Someone could be trying to sell a vintage model that might have been in their family, for example.

That’s why it’s difficult to determine the earliest valid year. However, with additional domain knowledge you could potentially narrow it down.

Hi @DnaData

Here is what I have tried to use as a logic in my project.

In the Reg_year column there are some interesting values. The stat values shows that reg_year minimum is 1000 and maximum is 9999!
It means that there were cars in 1000AD and they are for sale just like the cars from the future- 9999AD to be exact- are also for sale.

This means that there are obviously some errors in those data.

  • Since the date crawled is in 2016, any registration year that comes after 2016 must be wrong.
  • Any year way ahead of 1800 should also be wrong. Google research says 1885–1886 is when the first automobile was invented.

100 year old cars?! Really?

From the data we have, we can’t really say if the registed year of these vehicles are true or not.

  • Opel founded in 21 January 1862, Rüsselsheim, Germany
  • Renault founded in 1 October 1898, Boulogne-Billancourt, France
  • BMW founded in 7 March 1916, Munich, Germany
  • Ford founded in 16 June 1903, Detroit, Michigan, United States

It can be true since these companies have started a 100 years ago. But if these vehicles are 100 years old and in sellable conditions, the price listed doesn’t make any sense. Vintage cars can cost a lot of money. So it looks like those 106 year claims are most probably wrong. So we can remove them from the analysis.

I hope this makes some sense.


You are right about looking at whether the price for the oldest cars makes sense before using the year as the lower limit. DQ has left us some room to experiment and it all comes down to the why.

