Act fast, special offers end soon! Up to $294 is savings when you get Premium today.
Get offer codes

Understand the answer given for the exercise

Screen Link:
https://app.dataquest.io/m/354/regular-expression-basics/11/challenge-using-flags-to-modify-regex-patterns

My Code:

import re

email_tests = pd.Series(['email', 'Email', 'e Mail', 'e mail', 'E-mail',
              'e-mail', 'eMail', 'E-Mail', 'EMAIL', 'emails', 'Emails',
              'E-Mails'])

pattern = r'\be-? ?mails?\b'
email_mentions = titles.str.contains(pattern,flags=re.I).sum()

Hello!
I submit the answer above for the exercise and it worked well, following this logic:

  • all words in the list begins with and e or E;
  • some of them have an hyphen symbol between the first letter and the second letter;
  • some of them have an emppty space between the first letter and the second letter
  • some of the have the letter “s” at the end;
  • none ot them have a hyphen and a empty space simultaneously.

But this is the answer given for the exercise:

pattern = r"\be[\-\s]?mails?\b"

I’m having a hard time to understand this piece of code: [\-\s]?.

As far as i understand (and I’m aware that I’m probably wrong) it would match cases where you have nothing between the “e” and the “m” letter (in other words, the set does not exist), and it would also match cases where there is a hyphen followed by “any space, tab or linear break character”, which is the eactly definition given for \s during the mission.

Considering all the items in email_tests (and also my interpretation), the results would be:

email -> match, there is nothing between the “e” and the “m”
Email -> match, there is nothing between the “E” and the “m”
e Mail -> no match, there is not a hyphen followed any space, tab or linear break character
e mail -> no match, there is not a hyphen followed any space, tab or linear break character
E-mail -> no match, there is not a hyphen followed any space, tab or linear break character
e-mail -> no match, there is not a hyphen followed any space, tab or linear break character
eMail -> match, there is nothing between the “e” and the “M”
E-Mail -> no match, there is not a hyphen followed any space, tab or linear break character
EMAIL -> match, there is nothing between the “E” and the “M”
emails -> match, there is nothing between the “e” and the “m”
Emails -> match, there is nothing between the “E” and the “m”
E-Mails -> no match, there is not a hyphen followed any space, tab or linear break character

I was also not able to understand why to use the scape backlash before the hyphen…

By the way, the first steps with Regular Expressions have been great! It’s hard to grasp the way of working at the beginning, but it’s very clear that it can be a powerful weapon to be used along with pandas for the next projects.

Wish you all a nice sunday!

Paulo

1 Like

Hi @phssaraiva:

\s matches a whitespace character as I answered here and \- matches a hyphen. This means that spaces or hyphens are allowed between e and mail. The escape character \ is used to denote the importance of the next character because sometimes not having the backslash (say you want to filter for a double quote ") which terminates the string early on and thus python will throw an error in that case.

# Red part is valid, after that becomes black (invalidated portion of string)       
pattern = r"\be["\s]?mails?\b"

Hope this helps!

1 Like