I need help in understanding solution given for ebay project, How did the author come to conclusion
“Given that eBay is an auction site, there could legitimately be items where the opening bid is $1. We will keep the $1 items, but remove anything above $350,000, since it seems that prices increase steadily to that number and then jump up to less realistic numbers.”
and has removed the values
It looks like what the author is trying to avoid is outliers.
Outliers are extreme values that can lead us to make erroneous conclusions about the general population or the typical person/unit of interest.
We can see how this can happen by looking at the mean. If we had 6 values [2,2,3,4,3,100] with one outlier (100) the mean would be 19. Is this number a good representation of our 6 numbers or what is typical? Not really.
If we were to remove 100 from our values the mean would be 2.8. This number is a much better representation of what is typical.
Because outliers influence a lot of the methods that we use as data analysts/scientists it is usually best to do something with outliers (so they don’t lead us to erroneous conclusions). The easiest way to deal with these extreme values is to just remove them from the data.
This is probably what the author is referring to with the statement you referenced. The author is considering purchases above $350,000 to be outliers or extreme values that could lead us to make inaccurate conclusions.
Hope this helps,
Thank You @bvalgard,
I learn how they are trying to analyse, if i see data clearly there are very few records for prices higher than 350000. So removing them will be right way.