Advanced Regular Expressions Substituting Regular Expression Matches

Screen Link:

My Code:

email_variations = pd.Series(['email', 'Email', 'e Mail',
                        'e mail', 'E-mail', 'e-mail',
                        'eMail', 'E-Mail', 'EMAIL'])

email_uniform = email_variations.str.replace(r"\b[Ee][-]{0,1}[\s]{0,1}[Mm][Aa][Ii][Ll]?\b", "email", flags = re.I) # Regular expressions used to replace each of the matches in email_variations with "email" and assigned the result to email_uniform


titles_clean = titles.str.replace(r"\b[Ee][-]{0,1}[\s]{0,1}[Mm][Aa][Ii][Ll][s]{0,1}\b", "email", flags = re.I)

What I expected to happen:

What actually happened:

I am trying to use a regular expression to replace all mentions of email in titles with “email”. I am not sure exactly of what I am doing wrong here. I feel like I have the right idea.
Thank you
-Salem

1 Like

Hi @salemabdulkerim,

Yeah, you have the right idea and you’re close to the answer. I find the instructions provided are not specific enough so that’s probably why you can’t find the rest of the variations.

The answer requires that you also match for something like “emailing”, “emailer”, etc. Your pattern is only limited to the singular and plural version of “email”, so it won’t capture some other ways “email” can be formed.

Try modifying the pattern by making it open-ended at the end of the string so it can accommodate both the plural and some other forms the word can be written e.g. like the gerund “emailing”.

An example modification

titles_clean = titles.str.replace(r"\b[Ee][-]{0,1}[\s]{0,1}[Mm][Aa][Ii][Ll]", "email", flags = re.I)

2 Likes

Thank you for the help. This is what I did:
titles_clean = titles.str.replace(r"\b[Ee][-]{0,1}[\s]{0,1}[Mm][Aa][Ii][Ll][A-Za-z]{0,}\b", "email", flags = re.I)
I am still not getting the right answer, but I feel like this should be close because you said that my code has to capture other versions of email such as “emailing”, “emailer”, etc.
-Salem

1 Like

Hmm…after reviewing the instructions again, I can see my mistake.

What is required is the replacement of “email” variations but not the whole word that contains the variations. Variations here is not related to plural or any grammatical construct; only the variations in the base word “email” needs to be replaced e.g. “e-mail”, “E-mail”, “Email”.

This means that the pattern is only concerned with replacing the “email” portion in the whole word. This also means that you shouldn’t put anything else after the [Ll] or else the whole word will be replaced with “email”. For example, with your pattern “Emailing” will be replaced with “email”; what’s required is for “Emailing” to become “emailing”. The same goes for the plurals - “Emails” for instance is replaced by “email” when it should be “emails”, or “e-mails” is replaced by “email” when it should be “emails”.

2 Likes

Interesting. I had thought as well. Do you think I should use some capture groups on either the pattern or repl parameter? I’m confused why I can’t put anything after the [Ll] in the pattern parameter. I know you said part of the word needs to be replaced with email.

1 Like

All good my friend. I looked at the correct answer, but then changed up my code a little bit so that it does not look like the answer for the answer key. Also, I was under the impression that any of the letters in the word email can be either uppercase or lowercase.

1 Like

I’m confused why I can’t put anything after the [Ll] in the pattern parameter. I know you said part of the word needs to be replaced with email .

Unlike something like .match which doesn’t do much to what’s captured, .replace substitutes what’s captured with a specified string. This means that anything after the [Ll] will be substituted as well and that’s not really what the answer needs. You want to preserve what’s after the [Ll] so that something like “emails” or “emailing” does not get totally replaced by “email”.

Also, I was under the impression that any of the letters in the word email can be either uppercase or lowercase.

You’re correct. That’s covered by the ignore case flag re.I. So the pattern itself doesn’t need to specify both the upper and lowercase letters.

2 Likes