Guided project Exploring ebay car sales data

Screen Link:
https://app.dataquest.io/m/294/guided-project%3A-exploring-ebay-car-sales-data/3/initial-exploration-and-cleaning

My Code:

def clean_price(string):
    list_to_replace = ["$","km",","]
    for l in list_to_replace:
        string = string.replace(l,"")
        string = pd.to_numeric(string)
    return string

autos[["price","odometer"]].apply(clean_price, axis=1)

What I expected to happen:

I expect the 2 columns to be cleaned and converted to numeric

What actually happened:

ValueErrorTraceback (most recent call last)
pandas/_libs/src/inference.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "150,000km"

During handling of the above exception, another exception occurred:

ValueErrorTraceback (most recent call last)
<ipython-input-61-9769805d5738> in <module>()
      6     return string
      7 
----> 8 autos[["price","odometer"]].apply(clean_price, axis=1)
      9 
     10 

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4875                         f, axis,
   4876                         reduce=reduce,
-> 4877                         ignore_failures=ignore_failures)
   4878             else:
   4879                 return self._apply_broadcast(f, axis)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4971             try:
   4972                 for i, v in enumerate(series_gen):
-> 4973                     results[i] = func(v)
   4974                     keys.append(v.name)
   4975             except Exception as e:

<ipython-input-61-9769805d5738> in clean_price(string)
      3     for l in list_to_replace:
      4         string = string.replace(l,"")
----> 5         string = pd.to_numeric(string)
      6     return string
      7 

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    131             coerce_numeric = False if errors in ('ignore', 'raise') else True
    132             values = lib.maybe_convert_numeric(values, set(),
--> 133                                                coerce_numeric=coerce_numeric)
    134 
    135     except Exception:

pandas/_libs/src/inference.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: ('Unable to parse string "150,000km" at position 1', 'occurred at index 0')

What am I doing wrong? It says unable to parse string “150,000km” at position 1, but I removed the chars in my function?

1 Like

Hello!

You got this error because the pd.to_numeric() is inside the for, so it tries to convert the string to integer before it is completly clean.

In the first loop, the replace method tries to replace the “$” character. As the string “150,000km” does not contain this character nothing happens and imediattly after that you try to convert the string “150,000km” to numeric, but note that it still contains characters such as “,” and “km” so pandas connot transform it. To fix this, you need to use the pd.to_numeric() after the for, like this:

def clean_price(string):
    list_to_replace = ["$","km",","]
    for l in list_to_replace:
          string = string.replace(l,"")

    string = pd.to_numeric(string)
    return string

However, it is strongly recommended that you use string methods in situations like instead of looping through the whole the dataset. You learned about string methods in this course. This is how your code would look like:

autos["price"] = autos["price"].str.replace("$","").str.replace(",","").astype(int)    
autos["odometer"] = autos["odometer"].str.replace("km","").str.replace(",","").astype(int)

Hope this will be helpful.

4 Likes

For the odometer line “autos[“odometer”] = autos[“odometer”].str.replace…”, it says that I can´t use “str” because that´s not from pandas. It only worked by me removing all the “str” from that line. It did work on the price one. Does that make sense? Why could this happen?

The code below works just fine. If you are having any issues, I suggest creating a new topic and explain what’s going on. Post your code, the result you are getting and the one you expected and the link for the mission.

autos["odometer"] = autos["odometer"].str.replace("km","").str.replace(",","").astype(int)