HI, I am not sure I understand the concept of named capturing group. Does it like SQL ‘%’, to capture a similar pattern from a string?
There is a Match column in the output data frame, what is it for and what does it mean?
The reason we put(?P<col-name>
is to set up a name for a column in the returned dataframe,correct?
pattern = r"(?P<Years>[1-2][0-9]{3})"
Thank you !!
When you have a capture group, the name of the column for the capture group is number from 0
. If you have multiple capture groups, say 3. The names of the columns are 0, 1, 2
.
The ?P<col-name>
inside a capture group is the syntax to rename
the column.
So this pattern (?P<Years>[1-2][0-9]{3})
say name this column Years
and [1-2]
says find any number that starts with 1 or 2 as the starting number
and [0-9]{3} says find any number between 0-9 that follows each other 3 times
So you can have matches like 1999, 2999, 1009, 1111 ...
Thank you!! it is so clear!!
1 Like