I used wrong regex pattern , one with word boundaries and ended up in getting extra matches. Shared the code how I found out those extra matches .
I tried using pattern = r"(e[ -]mail[s])"
But result gave 151 , expected is 141
For debugging used below code:
pattern = r"(e[ -]mail[s])"
pattern2 = r"\be[-\s]?mails?\b"
email_mentions = titles.str.contains(pattern,flags=re.I)
email_mentions_2 = titles.str.contains(pattern2,flags=re.I)
email_capture = titles.str.extract(pattern,flags=re.I)
email_capture_bool = email_capture.fillna(False)
x = titles[titles.str.contains(pattern,flags=re.I)]
y = email_mentions.eq(email_mentions_2)
Below matched extra as I did not used word boundaries
450 Mailtrain (the open source Mailchimp clone) is…
4504 The fine art of literary hate mail endures
9006 N1 The extensible, open source mail client
11096 Ask HN: Why do dev communities still use maili…
11659 Donald Trump’s voicemails hacked by Anonymous
12619 Show HN: Undo send mail for Apple Mail
13432 The Mailbox Lights
13943 Why That Salesperson Just Wont Stop Emailing You
14161 Emailing SaaS companies to test support time
19838 Petition to Open Source Mailbox
Name: title, dtype: object