Guided Project: Exploring Ebay Car Sales Data [Removing outliers from price and odometer columns]

Screen Link: https://app.dataquest.io/m/294/guided-project%3A-exploring-ebay-car-sales-data/4/exploring-the-odometer-and-price-columns

My Code:

autos["price"].describe()
autos['price'].value_counts().sort_index(ascending=False)

What I expected to happen: I expected that both describe and sort_index would give me the same min and max values.

What actually happened:

output of autos[“price”].describe()

count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64

output of autos[‘price’].value_counts().sort_index(ascending=False)

99999999       1
27322222       1
12345678       3
11111111       2
10000000       1

Is there something I am doing wrong?
According to me the index of the series returned by value_counts is same as that of the values returned by the describe method.

Thanks in advance!
Cheers!

2 Likes

Both the methods give you the same min value, zero.

The max value seems to be rounded in the output of describe(). Notice that 1*10^8 is equal to 100,000,000 and the max value is 99,999,999 as shown in the output of value_counts().

4 Likes

I removed 0 and 99999999 as price outliers. The next highest value was 27322222. But when I do describe() I get 1.300000e+06 as the maximum for price. Why is this?
To remove the outliers I input:
autos_c = autos_c[autos_c[“price_USD”].between(1,2732222)]

[https://app.dataquest.io/jupyter/notebooks/notebook/Ebay%20Car%20Sales-Copy1.ipynb]

The next highest value after 99,999,999 is 27,322,222, but in your code you used 2,732,222. The second highest value is around 27 million but you typed 2 million.

The next highest value after the one you typed is 1,300,000 which is equal to 1.300000e+06.

2 Likes

I see my mistake, thank you!

1 Like

Hi, can you show me the code how to remove outliner 0 and 99999999? i tried the code showing in the instruction, but it does not work. Not sure if i did it correctly.

autos_c = autos_c[autos_c["price_USD"].between(1,27322222)]

The above is the code I used.