Should I wantonly remove data that seems inaccurate or is it more pertinent to attempt to reconcile the data

In the autos.csv guided project, there are over 1500 data points whose registration_year data is 2017 (one year ahead of when the data was scraped). My question is basically should I try to reconcile the 2017 registration year or simply delete it. It seems like a large amount of data to remove, but reconciling accurate car years sounds like a stupid task. Is there a happy medium where I don’t remove the data but replace 2017 with “unknown” or something? Or would doing that simply over-complicate the data, making the preferable option simply working without?

Click here to open the screen in a new tab.

Hi @BunterTheMage

If the amount of data that you think you need to remove is less or at least 5% there’s no problem to remove them. If it’s more than 5% then you need to try to find the reason why the “mistake” happened or if it’s really a mistake. For example, in this case I would assume that a car is registered using the fiscal year system or if it’s a new car they would use the model year as the registration year

2 Likes