Guided Project: Exploring Ebay Car Sales Data [Removing outliers from price and odometer columns]

Screen Link:

My Code:


What I expected to happen: I expected that both describe and sort_index would give me the same min and max values.

What actually happened:

output of autos[“price”].describe()

count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64

output of autos[‘price’].value_counts().sort_index(ascending=False)

99999999       1
27322222       1
12345678       3
11111111       2
10000000       1

Is there something I am doing wrong?
According to me the index of the series returned by value_counts is same as that of the values returned by the describe method.

Thanks in advance!


Both the methods give you the same min value, zero.

The max value seems to be rounded in the output of describe(). Notice that 1*10^8 is equal to 100,000,000 and the max value is 99,999,999 as shown in the output of value_counts().


I removed 0 and 99999999 as price outliers. The next highest value was 27322222. But when I do describe() I get 1.300000e+06 as the maximum for price. Why is this?
To remove the outliers I input:
autos_c = autos_c[autos_c[“price_USD”].between(1,2732222)]


The next highest value after 99,999,999 is 27,322,222, but in your code you used 2,732,222. The second highest value is around 27 million but you typed 2 million.

The next highest value after the one you typed is 1,300,000 which is equal to 1.300000e+06.


I see my mistake, thank you!

1 Like

Hi, can you show me the code how to remove outliner 0 and 99999999? i tried the code showing in the instruction, but it does not work. Not sure if i did it correctly.

autos_c = autos_c[autos_c["price_USD"].between(1,27322222)]

The above is the code I used.