Doubt in advance regular expression working

https://app.dataquest.io/m/369/advanced-regular-expressions/9/extracting-url-parts-using-multiple-capture-groups

I tried doing this mission in two ways but failed in both and was not able to come up with any explanation or a way to improve the code.

Answer:

pattern = r"(https?)://([\w\.\-]+)/?(.*)"
test_url_parts = test_urls.str.extract(pattern, flags=re.I)
url_parts = hn['url'].str.extract(pattern, flags=re.I)

My code1:

pattern = r'(.+)://([\w\-\.]+)/?(.*)'
test_url_parts = test_urls.str.extract(pattern)
url_parts = hn['url'].str.extract(pattern, flags = re.I)

My code 2:

pattern = r'(.+)://(.+)/?(.*)'
test_url_parts = test_urls.str.extract(pattern)
url_parts = hn['url'].str.extract(pattern, flags = re.I)

Why is my code not matching the answer?

1 Like

Hey, Pankaj.

It used to be the case that our official answer for this screen was similar to your code1. However, this answer is wrong.

This was already explored on this topic. Let me know if this sufficient to clarify things.

Feel free to ask follow up questions. Use your best judgment to decide whether to ask them on the linked topic, or here.

I have a followup question after fighting for some time with the regexp. This regexp seems to pass test_url_parts but doesn’t work for URL parts. What cases am I missing with it? I’ve checked the answer and agree that variant in the answer for group 1 is more robust :slight_smile: but I’m still curious

pattern = r’(\w+)://([^/?]+)/?(.*)?’
test_url_parts = test_urls.str.extract(pattern)
url_parts = hn[‘url’].str.extract(pattern, flags = re.I)

I believe this question has been incorrectly tagged with 254 (i.e. “Group Summary Statistics in SQL”) and should probably just have this tag removed.

1 Like

@mathmike314 : I have amended it thanks for notifying us

1 Like