2 questions about regex patern

Screen Link:

My Code:

pattern = r"[Pp]ython ?([\d\.]+)"
version = titles.str.extract(pattern, expand=False)
version_counts = dict(version.value_counts())

Hi,

I try to understand the logic behind the regexp patern. I have two questions:

  1. Why do we need a \ before the .? it is not used in the example of the explanation ([Pp]ython ?[\d.]+)
    2.Why do we enclose the character classes \d and \. with brackets? [] but not the +

Looking forward to your response,
Jeroen

1 Like

Hi Jeroen,

  1. It was different in that example: there the . means any character (except newline character). When instead we use \., it means we are searching literally for the dot itself. In general, when we use a backslash in regex, it means that the symbol after it loses its particular characteristics (if it had them) and is used literally as such, or obtains new characteristics (like \d).
  2. + means that we are looking for one or more characters. In the squared brackets we have \d., meaning that we are interested in a digit or any character. Adding + after the squared brackets means that we want one or more occasion of a digit or a character.
1 Like

@Elena_Kosourova Thanks for your answer.

In the squared brackets we have \d. , meaning that we are interested in a digit or any character

So in this patern we are looking for a digit or a dot? r"[Pp]ython ([\d\.]+)" so not a digit followed by a character?

Hi Jeroen,

Yes, in this pattern [\d\.] we’re looking for a digit or a dot, in the same way as we’re looking for P or p in [Pp]. If instead we were looking for a digit followed by a dot, we had to omit the squared brackets in that pattern.

To be 100% sure, you mean this with squared brackets? ()

No, these ones: []. So, for example, [Pp] will search P or p, while Pp will search P followed by p.

1 Like