Guided-project- Exploring eBay car sales

Screen Link:

Your Code: I got this from eploring the price and odometer_km columns
The maximun odometer: 150000
The minimum odometer: 5000
The maximum price: 99999999
The minimum price: 0

Not quite clear what your question / problem is? If it’s about extreme values, then the task was to eliminate outliers before analysis.

My questions is how do i identify extreme values? is there a ways that can make it easy for me to know which ones are extreme values

It was mentioned in mission description:

When removing outliers, we can do df[(df["col"] > x ) & (df["col"] < y )] , but it’s more readable to use df[df["col"].between(x,y)]

I have the same question but I think the original question was more so about how does one tell which values are extreme, not the function to differentiate them.

So, in the function from the assignment, how do I figure out what numbers will be represented by x and y.


1 Like

Here you are, a very simple and detailed description on how to calculate outliers.

1 Like

I think the question is related to the fact that there doesn’t appear to be any outliners in the ‘odometer_km’ data. All values are <= 150k km, which for a vehicle being sold, isn’t extreme. So, @sntohsi17 there isn’t anything to remove there.

The ‘price’ data on the other hand has some issues.

1 Like

I am still sort of confused… can I say $99999999 for the price column is an outlier?

Yes, that is an outrageous amount for a car on an auction site. So, $0 and $99999999 are outliers.

So using that code above choose a range for your price, its analysis, so it depends on the range you decide but do well to explain to your reader why you chose that particular range.


Thanks for the insight :slightly_smiling_face: .

i calculate outliers by this technique and it came out that it should be df[df[“price”].between(9150,18300)]
is it correct`

Hi! Did you remove just $0 values prior to computing these boundaries, or maybe others like $1, $2, $3 etc too? because they don’t look like have too much sense and keeping them in the dataset just resolves lower boundary to the negative one. Is it correct do not remove $0 and $99999999 and just multiple negative boundary by -1?