autos.csv guided project, there are over 1500 data points whose
registration_year data is 2017 (one year ahead of when the data was scraped). My question is basically should I try to reconcile the 2017 registration year or simply delete it. It seems like a large amount of data to remove, but reconciling accurate car years sounds like a stupid task. Is there a happy medium where I don’t remove the data but replace 2017 with “unknown” or something? Or would doing that simply over-complicate the data, making the preferable option simply working without?
Click here to open the screen in a new tab.
Should I wantonly remove data that seems inaccurate or is it more pertinent to attempt to reconcile the data
If the amount of data that you think you need to remove is less or at least 5% there’s no problem to remove them. If it’s more than 5% then you need to try to find the reason why the “mistake” happened or if it’s really a mistake. For example, in this case I would assume that a car is registered using the fiscal year system or if it’s a new car they would use the model year as the registration year