Why particularly this pattern i should use (r'(\d+)')?

combined_updated['institute_service_up'] = combined_updated['institute_service'].astype('str').str.extract(r'(\d+)')

I could not understand why particularly this pattern i should use (r’(\d+)’) ?
Can someone explain?

Without solutions file I would failed to use extract method.

2 Likes

Could you please add the link to the screen.

1 Like

hi @7933509

Execute the following code in your notebook and observe the results. Then let us know, what you have understood as the purpose of r"(\d+)".

s = pd.Series(["20 or more", 12, "15 01", "15 - 01", "35", "320987 or less"])
print(s, "\n")

print(s.astype('str').str.extract(r'(\d+)'), "\n")
1 Like

Yes, I understand! \d+ let me to extract any digits.
Thank you!

Why you use ‘\n’ in

print(s, “\n”)

and

print(s.astype(‘str’).str.extract(r’(\d+)’), “\n”)

?

I have got the same result without "\n"

hi @7933509

"\n" adds a new line (empty new line).

I used it just for representation purpose to evenly space out multiple variables I want to print.

Also "\d=" wil help you extract the first numeric it encounters. If you observe the results for 15 - 01 and 15 01, it only gives us 15 and omits 01.

I didn’t notice this earlier. We can also avoid the astype here by simply using s.str.extract(...)

How can i extract all digits?
If I want to get ‘15’ and ‘01’ from string “15 01”

print(s.astype(‘str’).str.extract(r’(\d+)’), “\n”) extract from “15 01” only ‘15’