Difference between (?=[\.\+]) and using [^.+]

In slide 4, we used the negative set pattern = r"\b[Cc]\b**[^.+]"** to apply any the regular expression that excludes + or .

I got 82 value counts when i used this code: pattern = r"(?<!Series\s)\b[Cc]\b**[^.+]"**

In slide 5, we used the negative lookahead ** pattern = r"(?<!Series\s)\b[Cc]\b(?![.+])**"
that excludes any + or . that follows the C/c alphabet. I got 103 value counts when i used this code

What are the difference between two patterns? it seems like both of them exclude any valeus that follow with a . or a + so i dont see why i cant use the first pattern for this?

1 Like

I think this [^.+] is not matching points where there are no whitespace after C (where C ends the sentence).

Kindly run this code to visualize the difference

pattern1 = r"(?<!Series\s)\b[Cc]\b(?![\.\+])"
pattern2 = r"(?<!Series\s)\b[Cc]\b[^.+]"
c_mentions1 =titles[titles.str.contains(pattern1)] #.sum()

a = set(c_mentions1)
b = set(c_mentions2)

print(a - b)
{'How I investigated Uber surge pricing in D.C', 
'Tis-interpreter  find subtle bugs in programs written in standard C',
 'A Superconductor That Works at -70 °C', 'Fixing C', 'Quotes from Jean-Paul Sartres Programming in ANSI C', 'Concurrency kit  Concurrency primitives and non-blocking data structures in C', 'Mary Jo White: Privacy Rules Shouldnt Handcuff the S.E.C',
 'The Absolutely True Story of a Real Programmer Who Never Learned C', 'Why I Write Games in C', 
'HP Unveils Premium Chromebook: 3K Display, Intel Core M, 16 GB of RAM and USB-C', 'Libui: GUI library in C', 'Pixel C', 'OS X app in plain C',
 'A Real Programmer Who Never Learned C', 'LispY C',
 'Implementing a sort of generic, sort of type-safe array in C',
 'Citizens Should Be Able to Vote on Laws Directly: Send Smith to D.C'}

1 Like

Hey, i found this video to be extremely useful in understanding regular expressions, kindly check it out


this video was awesome, thank you for sharing!

1 Like