Advanced Regex Expressions - Q. 5

I’m curious about a specific line of code in the solution from Advanced Regular Expresisons (screen 5)

Screen Link: https://app.dataquest.io/m/369/advanced-regular-expressions/5/using-lookarounds-to-control-matches-based-on-surrounding-text

Here is the code listed in the solution pattern I’m curious about :
pattern = r"(?<!Series\s)\b[Cc]\b((?![+.])|.$)"

Here is the code I used: pattern = r’(?<!Series\s)\b[Cc]\b(?![+.])’

I got the correct answer through my code, so I’m curious what the final few characters add : |.$

From my best guess, it’s meant to only (not) match the previous expression at the end of the string, ie a situation where a title ends C. or c.? It seems a bit superfluous in that case since the preceding expression should eliminate those cases?

1 Like

Hi @ryan.wetherbie,

Welcome to the community.
I’ve checked my code and I have used a similar regex that you have written and the answer checking gave me a green light. So I didn’t check the regex given in the solution. It looks like there was already a discussion going on.

Please have a look.

1 Like

Hi @ryan.wetherbie,

I was about to post the same topic when I saw your post.

The pattern you used is similar to what I had:

pattern1 = r"(?<!Series\s)\b[Cc]\b(?![+.])"

While this pattern worked for this particular exercise, I realized that the proposed answer provided by Dataquest:

pattern2 = r"(?<!Series\s)\b[Cc]\b((?![+.])|\.$)"

is more correct in general.

This is because the pattern1 will not match cases where the character [Cc] is at the end of the sentence followed immediately with a period "." (because of the negative lookaround (?![+.]).

Hence, a string with the following value:

string1 = "I find it difficult to learn C."

will not be matched using pattern1 but pattern2 will be able to.

This is because pattern2 tells the program to capture instances where [Cc] is not followed by the characters "+" or "." as represented by the negative lookaround (?![+.]) OR where [Cc] is followed by the character "." that is immediately followed by the end of the line or the string (\.$).

pattern1 works for this exercise because it just so happens that the data set we’re working with does not have cases such as the string1 example I gave where "C" is at the end of the sentence.

I hope the instructions and the prompt for this screen is updated because it is really very confusing and took more time to figure out than I would have preferred.

3 Likes