Pattern for step 9 in Advanced RegEx mission doesn't seem to work in regexr.com

Hi there, I’m working on step 9 in the advanced regular expression mission and as suggested earlier in the mission I’m using regexr.com to build these regexs first to use as a visual confirmation of how I’m doing with the expressions I’m building.

I came up with a pattern and tried it in dataquest script.py but it didn’t work so I checked the answer to see what the pattern was:

r"(.+)://([\w.]+)/?(.*)"

So I tried it in regexr.com and it didn’t work! It didn’t like the forward slashes so I’m wondering if there’s something I am doing wrong. I feel like I want to depend on regexr.com but if it doesn’t work for our purposes of getting through these exercises then I’ll just have to make do without it. Thanks.

Ken

As you may see in answer it has r"..." format, meaning raw. Python docs on string literals say:

String literals may optionally be prefixed with a letter ‘r’ or ‘R’; such strings are called raw strings and use different rules for interpreting backslash escape sequences.

So, in regexr.com you need to prefix each forward slash with backslash for your regex expression to work.

(.+):\/\/([\w.]+)\/?(.*)

In code, you have two options - either to use backslashes (as above) or to use raw format without backslashes like in the answer.

Hope this helps somehow.

UPDATE
Thanks @Bruno for the valuable note :slight_smile:

At regexr.com there was PCRE (Perl Compatible Regular Expressions) format selected. That’s why it required backslashing or ‘escaping’ (\) of forward slash symbols (/) to work.

For instance, here you can select different flavors to see how they process regex expression differently. And if you select Python flavor, you’ll see that regex (.+)://([\w.]+)/?(.*) works as it is (without escaping forward slash symbols).

In Python code you can compile regexes into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.

>>> import re
>>> PATTERN = re.compile(r"DQ")
>>> print(pattern.findall("DQ is da best DQ"))
['DQ', 'DQ']

Please, find more useful info on regexes here:

  1. Regular expression operations
  2. Regular Expression HOWTO
    2.1 Using Regular Expressions

Hope this also adds something meaningful :slight_smile:

3 Likes

Oh yes now it works. Thanks for the explanation to why it acts that way on the regexr.com ranklord.

Cheers,
Ken

2 Likes

@kentake, you may want to mark (any) helpful answer as a solution so that it won’t be lost among other comments and future readers may easily find it. Thanks :wink: