In the autos.csv
guided project, there are over 1500 data points whose registration_year
data is 2017 (one year ahead of when the data was scraped). My question is basically should I try to reconcile the 2017 registration year or simply delete it. It seems like a large amount of data to remove, but reconciling accurate car years sounds like a stupid task. Is there a happy medium where I don’t remove the data but replace 2017 with “unknown” or something? Or would doing that simply over-complicate the data, making the preferable option simply working without?
Click here to open the screen in a new tab.
If the amount of data that you think you need to remove is less or at least 5% there’s no problem to remove them. If it’s more than 5% then you need to try to find the reason why the “mistake” happened or if it’s really a mistake. For example, in this case I would assume that a car is registered using the fiscal year system or if it’s a new car they would use the model year as the registration year
2 Likes