Hi @moroa, it’s been a while since I’ve used regex but I’ll try to help you as best I can.
Looking at your pattern vs the one provided by DQ, we can see that the main difference is in our capture group: in other words, should we include \b in the capture group or not?
I did a bit of research and discovered that \1 refers to the matched text, not a regex! (resource)Therefore, your pattern will find “extra” matches like: Google's self driving... because the s after the apostrophe matches the group (\b\w+\b) and then \1 matches on the s in self because the \1 no longer cares about word boundaries…it only cares about the text that matched in the group (namely s).
Here are some other things it will find but shouldn’t:
1. No end in sight as repair work on California's sinking land costs billions
2. Niantic (Pokemon Go) appears to be hosting the entire world on one server
3. Salesforce lost 3.5 hours of customer data in instance NA14
4. The Theory of Concatenative Combinators
5. Performance Improvements in C Code Using Micro-Optimizations
Reasoning:
this is similar to my example above: the 's followed by a word that starts with s
capture group matches on on and \1 matches on first two characters of one
capture group matches on in and \1 matches on first two characters of instance
capture group matches on The and \1 matches on first three characters of Theory
capture group matches on C and \1 matches on first character of Code
The reason the DQ solution produces the desired results is because that second \b after the reference to \1 ensures we get matches on whole words and not partial ones like we see above.
Regex takes a lot of practice and I’m sorry to say that I have only scratched the surface myself. I find using sites like regexr help a lot for visualizing and breaking down what’s happening and why I get the results that I do.
Hope this helps and that I haven’t led you astray!