Some improvement suggestions on Data Cleaning Walkthrough: exercise 13

Screen Link: https://app.dataquest.io/m/136/data-cleaning-walkthrough/13/parsing-geographic-coordinates-for-schools

In the exercise about extracting a location from string I think it’s better to use vectorized approach data['hs_directory']['lat'] = data['hs_directory']['Location 1'].str.extract(pattern).str.split(',').str.get(0) then apply method.

What do others think?

8 Likes

Hi @pawel.wisniewicz,

data['hs_directory']['lat'] = data['hs_directory']['Location 1'].str.extract("\((.+)\)").str.split(',').str.get(0) 

This is great! I will suggest it to the content team.

Best,
Sahil

Hi,

I had the same idea as Pawel:

Maybe it is even more in line with Mission 369 “Advanced RegEx” to use:
pattern = r"\((.+),.+\)"
data["hs_directory"]["lat"] = data["hs_directory"]["Location 1"].str.extract(pattern)

With some minimal changes to the code it is also possible to extract both “lat” and “lon”, which also solves screen 14 of this missions:

pattern = r"\((.+),(.+)\)"
data["hs_directory"][["lat","lon"]] = data["hs_directory"]["Location 1"].str.extract(pattern).astype(float)

8 Likes