OK… Raising the flag of surrender here. Can one of you regex experts explain to me the following…
Why do you not need backslashes for this:
pattern = r"\b[Cc]\b[^.+]"
But you do need backslashes for this:
pattern = r"(?<!Series\s)\b[Cc]\b(?![\+\.])"
Is it something to do with the lookahead functionality?
If I am using the r to make it raw text, I can understand with the first example while it will exclude either the . or the +. But, with the 2nd example, in the lookahead function, the backslashes are required. Not sure why though?
For reference, this is from the Advanced Regular Expressions training in the Data Science for Python curriculum.
You actually don’t need the backslashes in the second pattern. It works just the same without them.
The motivation for using the backslashes is that
. are symbols with special meaning in the regex world, so in order to use the literal symbols of “plus” and “full stop” one would need to escape them.
However, inside square brackets many symbols (like
.) lose their special meaning, so there’s no need to escape them. Hence backslashes are unnecessary in the second pattern, just as they aren’t necessary in the first.
You can see the details in the documentation. I leave the relevant excerpt below:
Hey, do you still need help with this? Is there anything I can clarify?
Bruno, Thanks for checking in! I am good now. Thanks again for your help.