Working of the RE

Screen Link: https://app.dataquest.io/m/369/advanced-regular-expressions/9/extracting-url-parts-using-multiple-capture-groups

My Code:

r"(\w+)://([\w\-\.]+)"

What I expected to happen:
If i use the above code for the url https://www.valid.ly?param
then is should return thew whole url https://www.valid.ly?param
because the syntax include . means any char except newline, tab etc … and this include ? also then ([\w\-\.]+) should return the whole url www.valid.ly?param till params.
What actually happened:
returns https://www.valid.ly

Why it stops before ? and doesn’t read further?

1 Like

Because you’ve not specified them.

r"(https?)://([\w\.\-]+)/?(.*)"

But the . read those character and it includes all char in www.valid.ly?param.
Consider for this part: www.valid.ly?param
if i write r"([\w\-\.]+)" ,then it should read www.valid.ly?param but that code only gives : www.valid.ly and not proceed further, it should not stop before ? and give that whole url as an output but this doesn’t happen why??

No @nishu123tushar, it returns the correct string because according to pattern you are selecting one or more (+) alphanumeric or underscore (\w) with skipped dash and dot character (\-, \.).