Regex Components - 369-5

Screen Link:
Advanced Regular Expressions In Python | Dataquest

pattern = r"(?<!Series\s)\b[Cc]\b((?![+.])|.$)"

Above is the pattern which is given as the answer . I am trying to understand the different components of it . I am stuck at this part -(?![+.])|.$.

I understand that ‘?!’ is a negative lookahead , that pattern prior to it would match if it is NOT followed by - [+.])|.$. Is this related to this part of the mission -" without removing instances where the match occurs at the end of the sentence ."?

Also I don’t I have seen this syntax earlier in the mission - |.$ - what does this indicate?

I think we have a problem in the content. It seems like we’re not teaching the meaning of | before using it. That’s the only part we haven’t taught. You can read about $ here.

Anyway, | is regex’s way of implementing the logical “or” (match this or that). Straight from the documentation:

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

No, there’s a closing parenthesis there. You should read it as “if not followed by whatever [+.] means”.

Yes.


You can understand ((?![+.])|.$) as “not followed by either +, nor ., except for when . is the last character”.

2 Likes

Thank you. Got me thinking. I will come back if there are any queries. I now remember, we have read about " | " as part of creating booleans but yes not as part of regex.

1 Like