Exploring Ebay Car Sales Data - Dataframe not sorting properly

So I’m on page 4 of the guided project.

I’m trying to remove high and low outliers from two columns in the dataframe.

To do this, I am plotting the series in a line chart so I can see where to slice the values.

This code works:

range_bool = autos["registration_year"].between(1980,2020)
final_distribution = autos[range_bool]
final_distribution["registration_year"].value_counts().sort_index().plot()

I get a nice sorted line chart. Hooray!

BUT…

This code does not work:

price_bool = autos["price"].between(1000,30000)
final_price_distribution = autos[price_bool]
final_price_distribution["price"].value_counts().sort_index().plot()

I get a line chart with lots of 0 values and only approximate sorting.

And this definitely doesn’t work:

price_bool = autos["price"].between(1000,30000)
final_price_distribution = autos[price_bool]
final_price_distribution["price"].sort_values().plot(kind='line')

The difference is obviously that the “registration_year” data has a much smaller range of values than the “price” series. But I would still expect the sort to work properly.

Many thanks - all help appreciated!

James

This is a bit difficult to answer without looking at the charts and the rest of your code and without more details.

Because when I try to modify your code a bit so that I can run it in my project for the second piece of code you share, I get the following chart -

image

And I am not entirely sure whether it’s correct or not, and/or what you expect it to be. This could very well be because I haven’t worked on the project in a long time to be sure either of what’s expected in this case.

It would help if you added more details clarifiying your questions.

Thank you!

I got the same result.

The issue is more that this is clearly 1. not a line chart, 2. not sorted.

The dataset is a series of integers between x and y where x > 0.

I would expect the series to match some kind of normal curve distribution.

Even if there is no curve, it should be a smooth line with a consistent gradient, not a jagged unsorted bar.

image

A similar series from the same dataset looks like this. Albeit, there is less variation in the value_counts().
image

But the important thing is, it’s a line chart and it’s sorted.

I’m missing something obvious, I know, but it’s eluding me.