31 Years of Python | 48 Hour Sale Extension!!!
days
hours
minutes
seconds

I'm confused with the different maximum price values from the Guided Project: Exploring eBay Car Sales Data

Why is the maximum value of price in the csv file different from the output generated after using .describe() method?

Screenshot (156)

Please help!

Greetings @maduka.maureen.

I have been very struck by your question, so I have launched the project and I show you what I have, this is part of my project I teach you what my first steps lets see if it gives you a key.

Exploring Ebay Car Sales Data

from IPython.display import Image

1. Introduction

The aim of this project is to clean the data and analyze the included used car listings.

Data dictionary:

  • dateCrawled - When this ad was first crawled. All field-values are taken from this date.
  • name - Name of the car.
  • seller - Whether the seller is private or a dealer.
  • offerType - The type of listing
  • price - The price on the ad to sell the car.
  • abtest - Whether the listing is included in an A/B test.
  • vehicleType - The vehicle Type.
  • yearOfRegistration - The year in which the car was first registered.
  • gearbox - The transmission type.
  • powerPS - The power of the car in PS.
  • model - The car model name.
  • kilometer - How many kilometers the car has driven.
  • monthOfRegistration - The month in which the car was first registered.
  • fuelType - What type of fuel the car uses.
  • brand - The brand of the car.
  • notRepairedDamage- If the car has a damage which is not yet repaired.
  • dateCreated - The date on which the eBay listing was created.
  • nrOfPictures - The number of pictures in the ad.
  • postalCode - The postal code for the location of the vehicle.
  • lastSeenOnline - When the crawler saw this ad last online.
import numpy as np
import pandas as pd
autos = pd.read_csv("autos.csv", encoding = "Latin-1" )

1. Exploring Data

autos.head(3)
Unnamed: 0 dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox ... model kilometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 0 2016-03-24 11:52:17 Golf_3_1.6 privat Angebot 480 test NaN 1993 manuell ... golf 150000 0 benzin volkswagen NaN 2016-03-24 00:00:00 0 70435 2016-04-07 03:16:57
1 1 2016-03-24 10:58:45 A5_Sportback_2.7_Tdi privat Angebot 18300 test coupe 2011 manuell ... NaN 125000 5 diesel audi ja 2016-03-24 00:00:00 0 66954 2016-04-07 01:46:50
2 2 2016-03-14 12:52:21 Jeep_Grand_Cherokee_"Overland" privat Angebot 9800 test suv 2004 automatik ... grand 125000 8 diesel jeep NaN 2016-03-14 00:00:00 0 90480 2016-04-05 12:47:46

3 rows × 21 columns

autos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 371528 entries, 0 to 371527
Data columns (total 21 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   Unnamed: 0           371528 non-null  int64 
 1   dateCrawled          371528 non-null  object
 2   name                 371528 non-null  object
 3   seller               371528 non-null  object
 4   offerType            371528 non-null  object
 5   price                371528 non-null  int64 
 6   abtest               371528 non-null  object
 7   vehicleType          333659 non-null  object
 8   yearOfRegistration   371528 non-null  int64 
 9   gearbox              351319 non-null  object
 10  powerPS              371528 non-null  int64 
 11  model                351044 non-null  object
 12  kilometer            371528 non-null  int64 
 13  monthOfRegistration  371528 non-null  int64 
 14  fuelType             338142 non-null  object
 15  brand                371528 non-null  object
 16  notRepairedDamage    299468 non-null  object
 17  dateCreated          371528 non-null  object
 18  nrOfPictures         371528 non-null  int64 
 19  postalCode           371528 non-null  int64 
 20  lastSeen             371528 non-null  object
dtypes: int64(8), object(13)
memory usage: 59.5+ MB
autos.describe()
the
Unnamed: 0 price yearOfRegistration powerPS kilometer monthOfRegistration nrOfPictures postalCode
count 371528.000000 3.715280e+05 371528.000000 371528.000000 371528.000000 371528.000000 371528.0 371528.00000
mean 185763.500000 1.729514e+04 2004.577997 115.549477 125618.688228 5.734445 0.0 50820.66764
std 107251.039743 3.587954e+06 92.866598 192.139578 40112.337051 3.712412 0.0 25799.08247
min 0.000000 0.000000e+00 1000.000000 0.000000 5000.000000 0.000000 0.0 1067.00000
25% 92881.750000 1.150000e+03 1999.000000 70.000000 125000.000000 3.000000 0.0 30459.00000
50% 185763.500000 2.950000e+03 2003.000000 105.000000 150000.000000 6.000000 0.0 49610.00000
75% 278645.250000 7.200000e+03 2008.000000 150.000000 150000.000000 9.000000 0.0 71546.00000
max 371527.000000 2.147484e+09 9999.000000 20000.000000 150000.000000 12.000000 0.0 99998.00000
autos.max()
/tmp/ipykernel_20336/934174897.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  autos.max()

Unnamed: 0                                371527
dateCrawled                  2016-04-07 14:36:58
name                   Übernahme_Leasingvertrag
seller                                    privat
offerType                                 Gesuch
----> price <-----                  -----2147483647-----
abtest                                      test
yearOfRegistration                          9999
powerPS                                    20000
kilometer                                 150000
monthOfRegistration                           12
brand                                      volvo
dateCreated                  2016-04-07 00:00:00
nrOfPictures                                   0
postalCode                                 99998
lastSeen                     2016-04-07 14:58:51
dtype: object
  • I see that your question arises in cell 21.

  • Usually when you want to see what happens (as describe) in the dataframe you do it shortly after loading the data, because it is possibly you took steps back that modified that value and when looking at it it appears modified. (is a possibility)

Maybe there is some step that we have not done the same… if on the contrary I have been of help to you I would appreciate you to put solved.

Let’s start here To see if I have been able to help you.

A&E.

Hello @Edelberth , I sincerely apologize for my late response. However, I’m most grateful for providing me insight to the above problem because it made me realize that I was comparing the resuIts obtained from working with the modified data on the platform with the original data in excel just as you had speculated.

Thank you!

Hello @maduka.maureen everything is fine :+1:

I’m very glad I was able to help you, what I would say for the next time you need to put something here is that you do it in markdown

Also without wanting to acquire that skill, it will help you a lot in the jupyter notebook stuff as in Github README’s

Pleasure to have been able to help you. :wink:

A&E HC