Str.extract(r'(\d+)')--- what does' \d+' mean?

Hi all! dont understand the pattern (r'(\d+)'), what we want to do here?

My Code:

combined_updated['institute_service_updated']=combined_updated['institute_service'].astype(str).str.extract(r'(\d+)')

What I expected to happen:

What actually happened:

Replace this line with the output/error
1 Like

Regex isn’t one of my strong points, but according to regexr it’s a character that matches one or more digits (0-9)

This is a Regular Expression pattern \d is a regex pattern for digit + is a regex pattern for at least (one or more) since they are enclosed in a ( ) that means the group that you want to capture.

Regex Explanation:

Extract/capture one or more digit (0 - 9)

2 Likes

Hi @info.victoromondi
Thank you for you answering!!
so r'\d+' means to capture 1 digit? In the instruction, we learned {}. it means that how many time we want the number or letter to repeat. Why dont we need to use here to indicate the pattern?

\d is for one digit only eg 1
\d+ one or more (at least one digit) digit eg 1 or 34 or 983434 etc.

3 Likes

You mentined that () enclosed r'\d+' mean that i want to capture the group. For example, the value is 04/2016. how can python know to capture 2016 instead of 04?

Thank you again!

04 will be extracted. Consider this example

In [7]: df = pd.DataFrame({'dt':[str(num)+'/' + str(2000+num) for num in range(5)], 'name':list('victo')})

In [8]: df
Out[8]:
       dt name
0  0/2000    v
1  1/2001    i
2  2/2002    c
3  3/2003    t
4  4/2004    o

In [9]: df.dt.str.extract(r"(\d+)")
Out[9]:
   0
0  0
1  1
2  2
3  3
4  4

Got you thank you!

if i want to extract year in your example, then how to do it? in the regular way only str.split('/').str[-1]?