Parallel extraction

Screen Link:
data-cleaning-walkthrough-13

My Code:


pattern = r"(\(.+\))"
geodata=data["hs_directory"]["Location 1"].str.extract(pattern).str.replace("(","").str.replace(")","").str.split(",")



geoframe=pd.DataFrame(geodata.loc[:].tolist(), columns=["lat","log"])

data["hs_directory"]["lat"]= geoframe["lat"]

data["hs_directory"].head()

This is alternate solution I shared so that we can extract latitudes using string parallel function which is more efficient.

6 Likes

I was thinking about that, and a quick google search didn’t give me any good comparisons between str.extract and re.findall. (which points to a possible conclusion that they are not so similar). Any wizards want to share their wisdom? on str.extract vs re.findall on the above example?