With my code, I got all TRUE on the test list, but yields 108 “email” results instead of your 151 on the target list. Can you explain your pattern, especially the first backslash after “e”?
Thank you.
(PS. Although I am including the backslash in your pattern code above, it is not showing up)
\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
e matches the character e literally (case sensitive)
-? matches the character - literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
\s? matches any whitespace character (equal to [\r\n\t\f\v ])
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
mail matches the characters mail literally (case sensitive)
\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
’ matches the character ’ literally (case sensitive)
e matches the character e literally (case sensitive)
Match a single character present in the list below [-\s]?
- matches the character - literally (case sensitive)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
mail matches the characters mail literally (case sensitive)
You can use the code below test for string that works for specific pattern.
Simply change the regex string to the pattern string listed above.
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"e[-\s]?mail"
test_str = ""
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
The 1st pattern is both more restrictive and permissive than the 2nd pattern.
Restrictive because of the \b. Permissive because you broke up [-\s]? into -?\s? which makes it possible for both - and space to appear.
I think the answer to the challenge problem should be 143 instead of 151. Even if you use the exact same code that is mentioned in the answer, you will not get 151.
Hi all,
I’ll use this thread instead of making a new post.
There is a difference between my first try and the expected answer (=143) and I think many of us experienced this difference at some point.
I think there is a misunderstanding on the expected answer, since with the correct answer there are some words included such us:
Emailing
Emails
email-leak
For me was not clear if “Emailing” and “Emails” should be taken as valid or not. Maybe the exercise instructions should clarify this.
My initial approach was pattern=r’\b(e[-\s]?mail[s]?)\b’
and it passed the list test matching all listed items but in the exact dataset it found 141 matches when I removed the last \b the pattern included items listed by fedepereira.
BTW great course. I have been learning regular expressions from the book: Automate boring stuff with python, but your course is really comprehensive