 # Guided Project: Exploring Ebay Car Sales Data [Removing outliers from price and odometer columns]

My Code:

``````autos["price"].describe()
autos['price'].value_counts().sort_index(ascending=False)
``````

What I expected to happen: I expected that both describe and sort_index would give me the same min and max values.

What actually happened:

output of autos[“price”].describe()

``````count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64
``````

output of autos[‘price’].value_counts().sort_index(ascending=False)

``````99999999       1
27322222       1
12345678       3
11111111       2
10000000       1
``````

Is there something I am doing wrong?
According to me the index of the series returned by value_counts is same as that of the values returned by the describe method.

Cheers!

2 Likes

Both the methods give you the same min value, zero.

The max value seems to be rounded in the output of `describe()`. Notice that 1*10^8 is equal to 100,000,000 and the max value is 99,999,999 as shown in the output of `value_counts()`.

4 Likes

I removed 0 and 99999999 as price outliers. The next highest value was 27322222. But when I do describe() I get 1.300000e+06 as the maximum for price. Why is this?
To remove the outliers I input:
autos_c = autos_c[autos_c[“price_USD”].between(1,2732222)]

The next highest value after 99,999,999 is 27,322,222, but in your code you used 2,732,222. The second highest value is around 27 million but you typed 2 million.

The next highest value after the one you typed is 1,300,000 which is equal to 1.300000e+06.

2 Likes

I see my mistake, thank you!

1 Like

Hi， can you show me the code how to remove outliner 0 and 99999999? i tried the code showing in the instruction, but it does not work. Not sure if i did it correctly.

``````autos_c = autos_c[autos_c["price_USD"].between(1,27322222)]
``````

The above is the code I used.