I have stuck in section-4 (Exploring the Odometer and Price Columns), about how to find outliers and remove them.
Click here to open the screen in a new tab.
You’ll have to be more specific than that.
Follow the steps given in the Learn section and analyze the result.
Like when you run
autos['price'].unique().shape, the result is
(2357,). So, we have 2357 unique
autos['price'].describe() gives us the minimum, maximum, mean, median prices. You can notice from the result of this code, that the minimum price is 0.
To find out how many rows have a price value 0, we can use
Thus, you can further analyze the price column, and can do the same with
Thank you for your replay, I have done all those steps that you mention, but my problem is how to determine which values to delete from the dataset, an example which values to ignore or delete in price or odometer_km columns.
Exploring-Ebay-Car-Sales-Data-test.ipynb (42.0 KB)
Click here to view the jupyter notebook file in a new tab
That’s great. The next step is to find out unrealistic prices. We have 1421 rows with price 0. We can delete those.
The result from
gives us an idea about the higher prices. The maximum price is 99999999, which is too much. We can set a threshold for max price and then delete rows having prices higher than that.
Hope this helps