Error converting columns to numeric data type

eBay Car Sale guided project: When I try to convert the ‘price’ and ‘odometer’ columns to numeric dtypes, I get an error message. Below is my code:

#remove non-numeric characters:
autos[‘price’] = autos[‘price’].str.replace(’$’,’’)
autos[‘odometer’] = autos[‘odometer’].str.replace(‘km’,’’)

#convert to numeric dtype (also tried ‘float’ with similar result):
autos[‘price’] = autos[‘price’].astype(int)
autos[‘odometer’] = autos[‘odometer’].astype(int)

#rename column:
autos.rename({‘odometer’:‘odometer_km’},axis=1,inplace=True)

Please advise.

What error are you receiving? It’s possible that you haven’t removed all of the non-numeric characters. Let’s have a look at the unique values in the odometer column.

autos['odometer'].unique()

Output:

array(['150,000km', '70,000km', '50,000km', '80,000km', '10,000km',
       '30,000km', '125,000km', '90,000km', '20,000km', '60,000km',
       '5,000km', '100,000km', '40,000km'], dtype=object)

I looks like in addition to 'km', we also have commas. The code you pasted doesn’t show removing any commas though, so you may need to still remove them in order to convert the values.

Thanks April! I removed the commas and got the desired results.

1 Like

One more question re: this project. After filtering registration_year for 1900 - 2016, I tried removing the outliers for registration_year using the code autos = autos[autos[‘registration_year’].between(1900, 2016)] and it didn’t look to me like the data set updated. Was I going about it incorrectly?

The code looks okay. The updated data set doesn’t print automatically as the result was saved as a variable. You can quickly check if it worked by running this code in another cell:

autos['registration_year'].unique()

You’ll be able to see pretty quickly if all the years are between 1900 and 2016 as expected.