Why can't I use this regex pattern in this problem?

Hello everyone,
I am at this screen and I kept getting the ‘your answer doesn’t match expected’.

Question:

titles_clean = titles.str.replace(pattern, 'email',flags = re.I)

Their Solution: pattern = r'e[\s\-]?mail'
The pattern I used: pattern = r'e.?mails?'
Another pattern I used: pattern = r'e[\s\-]?mail[Ss]?'
Both of them worked on regexr.com and with my test iterations but the solution wouldn’t accept them. Can someone help me understand why that is please?

Thank you, Ray.

Hey, Ray.

Both your patterns will actually pass this screen if you drop the idea of capturing the plural version.

I have an issue with your first pattern, though. That . will match much more than what we’re looking for. It ends up not making a difference for this dataset.

If you’re wondering why the plural makes a difference, this is something you wouldn’t spot on regexr.com because while all patterns will match the same thing, they’ll act on them differently.

I’ll exemplify using the regex module directly:

>>> import re
>>> re.sub("e[\s\-]?mail", "email", "I got me some e-mails.")
'I got me some emails.'
>>> re.sub("e.?mails?", "email", "I got me some e-mails.")
'I got me some email.'

I hope this helps.

1 Like

Hey @Bruno,
So by adding the plurality to the pattern, I’m changing the "e-mails" to "email" which changes the meaning of the whole sentence and thus results in the wrong answer. I see what you mean, it makes sense. However, what if we had 'e-mailS' instead of 'e-mails'? Wouldn’t that change it to 'emailS', which is not what I’m looking for? As my goals is to clean the data.
For my first pattern, I knew using the . would include all but I was going off of the list that was given and it matched all so I decided to keep it that way for easier readability.
Thank you for your help.

That depends on what “cleaning” means. Most of the time you have a specific purpose in mind when cleaning data. If the goal is to make it look pretty, then I suppose the goal isn’t accomplished by this.

If, on the other hand, the goal is to easily be able to identify titles that concern the topic of e-mail, then emails and emailS will both work since you’ll probably be looking for the substring email anyway.

1 Like

I’ve just spent a lot of time a few exercise ago not understanding why my pattern worked for the test series and not for the live data, and the solution was that even though the exercise did not state it, I should have had used word boundaries too. So I learned my lesson and this time I used word boundaries while replacing, and again it worked perfectly for the test series and then failed without any useful feedback while submitting. So again I had to cheat and check the solution only to find out that this time I should have had replace without word boundaries.

This is getting annoying. Please make sure that either all correct solutions are accepted for these exercises or the exact requirement is stated in the instructions.

1 Like