Regex to match repeated words

Screen Link: https://app.dataquest.io/m/369/advanced-regular-expressions/6/backreferences-using-capture-groups-in-a-regex-pattern

Your Code: pattern = r'(\b\w+\b)(\s)\1'

Answer Code: pattern = r'\b(\w+)\s\1\b'

Please help me understand the difference between my code regex and the answer code regex.

Hey, Aditya.

I’ve answered a sufficiently similar question here. Let me know if this isn’t enough. In any case, I’ll give the gist of it here.

Capture groups only capture text. After the regex engine captures a word with \b\w+\b, it just sees the word, it doesn’t see the word boundaries anymore.

When you reference the group with \1, because only text was captured, it matches whatever was captured by \b\w+\b even if what follows \1 isn’t a word boundary. In other words, \1 doesn’t have to be a word itself, it can be a subword.

That’s why you capture results like Niantic (Pokemon Go) appears to be hosting the entire world on one server and the solution doesn’t.

3 Likes

Thanks for the answer link and the gist writeup, Bruno!

1 Like