Exploring eBay Car Sales Data - Problem with removing outliers

Screen Link:
image

My Code:

autos["price"] = autos[autos["price"].between(1, 12345678)]

What I expected to happen:

I have read the series.between(x, y) documentation, but there it only shows that this method helps to return a Boolean column that satisfies the given condition. However, when I run this code it somehow converted my price column to the object column, and when I run the describe method, it doesn’t show regular mean, 25% percentile and etc.

Can you help me by giving me some hints? Also, how does this method work? Does it just drop out the values or does it replace them with specific values?

Thank you!

What actually happened:

Replace this line with the output/error

Please include a link to the Mission/Project Step in your post as well.

As you saw in the documentation, .between() returns a boolean array which makes it great for masking (as you have done.) However, this bit of code:

will assign the entire dataframe (with prices between 1 and 12345678) back to the price column. I do not believe this is what you wanted to do… That’s why the price column is showing as an object now and gives you such weird results when you use describe() on it.

I believe what you’re looking to do is to filter your entire dataframe based on price. Luckily, you’ve already done this! The problem you’re getting is from assigning it back to autos["price"]… so let’s try NOT doing that!

Try this instead:

filtered_autos =  autos[autos["price"].between(1, 12345678)]

or if you’d like to replace your original dataframe with this newly filtered data, use:

autos = autos[autos["price"].between(1, 12345678)]

Let us know how this works for you because as @the_doctor points out, without a link to the mission, it is difficult to know without experimenting “in context” if this is correct or not.

1 Like

Thank you for reaching out! Here is the link:

https://app.dataquest.io/c/54/m/294/guided-project%3A-exploring-ebay-car-sales-data/4/exploring-the-odometer-and-price-columns

Thank you very much for your input!

I have tried it and seems like it is working well, however, I want to clarify this point one more time to not be just guessing about it. This operation gives us Boolean narray and looks like this autos[[True, False, True, True]], for example. And then it just uses this filtering to remove the rows that does not satisfy the condition, right?

Thank you a lot for helping out! I sincerely appreciate your effort!

Yes, exactly! Just to confirm and for your own edification purposes, try:

print(autos["price"].between(1, 12345678))

and see what the array looks like. Also, compare its shape to the shape of your original dataframe. You should see a correlation between the two.

But to be sure, yes, you are correct that when you use a boolean array to mask a dataframe it will return only those corresponding rows that have True and will not show the rows that have False. Take note that by doing this, you are by definition changing the shape of your dataframe.

1 Like

Thank you for your advice! Now I see that my dataset was reduced by 1423 rows

1 Like

It is my pleasure. Glad I was able to help.

Great observation, nicely done!