Exploring Ebay Car Sales Data (Guided Project)

Hello everyone. So I am currently working on the Ebay Car Sales Data Guided Project and while exploring the date columns (Step 5), I decided to look deeper at the registration_month column and noticed that it had a minimum value of 0 and a maximum value of 12, hence suggesting that 13 months were considered, when there are obviously only 12 months in a year. Presuming that the registration months categorise 1 to 12 as January to December, then the value 0 should be an error. I wondered if 0 was an outlier so I investigated further using:

autos[‘registration_month’].value_counts().sort_index()

Getting the output:

So there are almost 4500 entries with a registration month of 0 (about 9%) of all values. The value 0 must be an error then? I’m not sure so I decided to point this out as I hadn’t seen it talked about in other discussions about this project here.

It’s probably likely that when the data was scraped from the website, some of the fields did not have a registration month and a zero was put in by default for these. The original dataset uploaded to Kaggle (before Dataquest sampled it for us to use) doesn’t give any specifics. As you find throughout the project, there are a lot of odd data points like this that result from user input on a website that we have to decide the best method to deal with. For these entries with a 0 for the month, you decide how to handle these based on what questions you’re trying to answer from the data. If the registration month is part of an important question, you may want to eliminate the data points; if not, you may decide keep them in to analyze other aspects.

1 Like