Why is the maximum value of price in the csv file different from the output generated after using .describe() method?
Please help!
Why is the maximum value of price in the csv file different from the output generated after using .describe() method?
Please help!
Greetings @maduka.maureen.
I have been very struck by your question, so I have launched the project and I show you what I have, this is part of my project I teach you what my first steps lets see if it gives you a key.
from IPython.display import Image
The aim of this project is to clean the data and analyze the included used car listings.
dateCrawled
- When this ad was first crawled. All field-values are taken from this date.name
- Name of the car.seller
- Whether the seller is private or a dealer.offerType
- The type of listingprice
- The price on the ad to sell the car.abtest
- Whether the listing is included in an A/B test.vehicleType
- The vehicle Type.yearOfRegistration
- The year in which the car was first registered.gearbox
- The transmission type.powerPS
- The power of the car in PS.model
- The car model name.kilometer
- How many kilometers the car has driven.monthOfRegistration
- The month in which the car was first registered.fuelType
- What type of fuel the car uses.brand
- The brand of the car.notRepairedDamage
- If the car has a damage which is not yet repaired.dateCreated
- The date on which the eBay listing was created.nrOfPictures
- The number of pictures in the ad.postalCode
- The postal code for the location of the vehicle.lastSeenOnline
- When the crawler saw this ad last online.import numpy as np
import pandas as pd
autos = pd.read_csv("autos.csv", encoding = "Latin-1" )
autos.head(3)
Unnamed: 0 | dateCrawled | name | seller | offerType | price | abtest | vehicleType | yearOfRegistration | gearbox | ... | model | kilometer | monthOfRegistration | fuelType | brand | notRepairedDamage | dateCreated | nrOfPictures | postalCode | lastSeen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2016-03-24 11:52:17 | Golf_3_1.6 | privat | Angebot | 480 | test | NaN | 1993 | manuell | ... | golf | 150000 | 0 | benzin | volkswagen | NaN | 2016-03-24 00:00:00 | 0 | 70435 | 2016-04-07 03:16:57 |
1 | 1 | 2016-03-24 10:58:45 | A5_Sportback_2.7_Tdi | privat | Angebot | 18300 | test | coupe | 2011 | manuell | ... | NaN | 125000 | 5 | diesel | audi | ja | 2016-03-24 00:00:00 | 0 | 66954 | 2016-04-07 01:46:50 |
2 | 2 | 2016-03-14 12:52:21 | Jeep_Grand_Cherokee_"Overland" | privat | Angebot | 9800 | test | suv | 2004 | automatik | ... | grand | 125000 | 8 | diesel | jeep | NaN | 2016-03-14 00:00:00 | 0 | 90480 | 2016-04-05 12:47:46 |
3 rows × 21 columns
autos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 371528 entries, 0 to 371527
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 371528 non-null int64
1 dateCrawled 371528 non-null object
2 name 371528 non-null object
3 seller 371528 non-null object
4 offerType 371528 non-null object
5 price 371528 non-null int64
6 abtest 371528 non-null object
7 vehicleType 333659 non-null object
8 yearOfRegistration 371528 non-null int64
9 gearbox 351319 non-null object
10 powerPS 371528 non-null int64
11 model 351044 non-null object
12 kilometer 371528 non-null int64
13 monthOfRegistration 371528 non-null int64
14 fuelType 338142 non-null object
15 brand 371528 non-null object
16 notRepairedDamage 299468 non-null object
17 dateCreated 371528 non-null object
18 nrOfPictures 371528 non-null int64
19 postalCode 371528 non-null int64
20 lastSeen 371528 non-null object
dtypes: int64(8), object(13)
memory usage: 59.5+ MB
autos.describe()
Unnamed: 0 | price | yearOfRegistration | powerPS | kilometer | monthOfRegistration | nrOfPictures | postalCode | |
---|---|---|---|---|---|---|---|---|
count | 371528.000000 | 3.715280e+05 | 371528.000000 | 371528.000000 | 371528.000000 | 371528.000000 | 371528.0 | 371528.00000 |
mean | 185763.500000 | 1.729514e+04 | 2004.577997 | 115.549477 | 125618.688228 | 5.734445 | 0.0 | 50820.66764 |
std | 107251.039743 | 3.587954e+06 | 92.866598 | 192.139578 | 40112.337051 | 3.712412 | 0.0 | 25799.08247 |
min | 0.000000 | 0.000000e+00 | 1000.000000 | 0.000000 | 5000.000000 | 0.000000 | 0.0 | 1067.00000 |
25% | 92881.750000 | 1.150000e+03 | 1999.000000 | 70.000000 | 125000.000000 | 3.000000 | 0.0 | 30459.00000 |
50% | 185763.500000 | 2.950000e+03 | 2003.000000 | 105.000000 | 150000.000000 | 6.000000 | 0.0 | 49610.00000 |
75% | 278645.250000 | 7.200000e+03 | 2008.000000 | 150.000000 | 150000.000000 | 9.000000 | 0.0 | 71546.00000 |
max | 371527.000000 | 2.147484e+09 | 9999.000000 | 20000.000000 | 150000.000000 | 12.000000 | 0.0 | 99998.00000 |
autos.max()
/tmp/ipykernel_20336/934174897.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
autos.max()
Unnamed: 0 371527
dateCrawled 2016-04-07 14:36:58
name Ãbernahme_Leasingvertrag
seller privat
offerType Gesuch
----> price <----- -----2147483647-----
abtest test
yearOfRegistration 9999
powerPS 20000
kilometer 150000
monthOfRegistration 12
brand volvo
dateCreated 2016-04-07 00:00:00
nrOfPictures 0
postalCode 99998
lastSeen 2016-04-07 14:58:51
dtype: object
I see that your question arises in cell 21.
Usually when you want to see what happens (as describe) in the dataframe you do it shortly after loading the data, because it is possibly you took steps back that modified that value and when looking at it it appears modified. (is a possibility)
Maybe there is some step that we have not done the same… if on the contrary I have been of help to you I would appreciate you to put solved.
Let’s start here To see if I have been able to help you.
A&E.
Hello @Edelberth , I sincerely apologize for my late response. However, I’m most grateful for providing me insight to the above problem because it made me realize that I was comparing the resuIts obtained from working with the modified data on the platform with the original data in excel just as you had speculated.
Thank you!
Hello @maduka.maureen everything is fine
I’m very glad I was able to help you, what I would say for the next time you need to put something here is that you do it in markdown
Also without wanting to acquire that skill, it will help you a lot in the jupyter notebook stuff as in Github README’s
Pleasure to have been able to help you.
A&E HC