Is there any difference between below 2 snippets of Regular Expression?
pattern = r'\be-? ?mails?\b'
pattern = r"\be[\-\s]?mails?\b"
Is there any difference between below 2 snippets of Regular Expression?
pattern = r'\be-? ?mails?\b'
pattern = r"\be[\-\s]?mails?\b"
Pattern 1 has 4 permutations from the two ? in the middle, ranging from matching 0,1,2 characters.
Pattern 2 has 3 permutations only ranging from matching 0,1 characters, the 1 being a choice of hypen or space
In addition to what @hanqi has said.
For pattern 1, you are looking for a match for the character between the \b
. The pattern will look for e
then look if -?
is available. If it is available or not, it matches it. It moves forward to ' '?
If there is a space available or not, it matches it. It matches mail
exactly and moves to s?
. If there is a s
or not, it matches it. So you can have combinations like this: emails, e mails, email e mail, e-mails, e-mails
.
However, the second pattern uses the set. [\-\s]
means either -
or ' '
, while ?
means match when you see -
or ' '
or when you do not see both of them at all. In the first pattern, it follows a hard sequence. In the second pattern however, the set makes this sequence unnecessary. Other things remains the same as in pattern 1.
I think if there is no ?
after the [\-\s]
in the second pattern, it will not match cases like this: emails, email
because either -
or ' '
must occur. You can verify this.
Cheers!
Thank you so much for your great answer!!! I totally understand the difference now!