I’m currently doing the Intermediate Python and Pandas course, and doing the guided project. I’m on page 3 which required some pandas string methods. It’s towards the last couple tasks:
You likely found that the
price
andodometer
columns are numeric values stored as text. For each column:
- Remove any non-numeric characters.
- Convert the column to a numeric dtype.
guided project page: https://app.dataquest.io/m/294/guided-project%3A-exploring-ebay-car-sales-data/3/initial-exploration-and-cleaning
I accomplished the task by doing it sort of ‘long-handed’:
autos["odometer"] = autos["odometer"].str.replace('km','') #works once while data type is a string autos["odometer"] = autos["odometer"].str.replace(',','') #works once while data type is a string autos["odometer"] = autos["odometer"].astype(int)
The solution, accomplished it short-hand:
autos["odometer"] = (autos["odometer"]
.str.replace("km","")
.str.replace(",","")
.astype(int)
)
solution link: https://nbviewer.jupyter.org/github/dataquestio/solutions/blob/master/Mission294Solutions.ipynb
I’m having a hard time remembering these short hand methods. Does it just come with time by looking at solutions and trying to remember the next time? Perhaps forcing myself to use it for some non-project tasks. Same goes for using shorthand when selecting rows and columns or method chaining. I tend to use intermediate steps. Might just gave to give it time and keep getting reminded but wondering if anyone has tips to speed up the process.
This might just be a rant, thank for listening if you made it this far!