Regular Expression Basics

Is there any difference between below 2 snippets of Regular Expression?

  1. pattern = r'\be-? ?mails?\b'
  2. pattern = r"\be[\-\s]?mails?\b"

Pattern 1 has 4 permutations from the two ? in the middle, ranging from matching 0,1,2 characters.
Pattern 2 has 3 permutations only ranging from matching 0,1 characters, the 1 being a choice of hypen or space

In addition to what @hanqi has said.

For pattern 1, you are looking for a match for the character between the \b . The pattern will look for e then look if -? is available. If it is available or not, it matches it. It moves forward to ' '? If there is a space available or not, it matches it. It matches mail exactly and moves to s?. If there is a s or not, it matches it. So you can have combinations like this: emails, e mails, email e mail, e-mails, e-mails.

However, the second pattern uses the set. [\-\s] means either - or ' ', while ? means match when you see - or ' ' or when you do not see both of them at all. In the first pattern, it follows a hard sequence. In the second pattern however, the set makes this sequence unnecessary. Other things remains the same as in pattern 1.

I think if there is no ? after the [\-\s] in the second pattern, it will not match cases like this: emails, email because either - or ' ' must occur. You can verify this.


1 Like

Thank you so much for your great answer!!! I totally understand the difference now!

1 Like