In regular expression, what do these 2 do? .+ and .*

I am on “Advanced Regular Expression” session 9 right now.

But I have no clue what these 2 do: .+ and .*

Also,

pattern = r’(https?)://([\w.-]+)/?(.+)’

Why does this not work?

1 Like

Hi @yoon.w.sung,

Welcome to the community!

. in regex represents any character. + will match 1 more more of the preceding character. So .+ will match all the characters.

* will match 0 or more of the preceding character. So even if previous character is not found, it will be a match. Therefore, .* will match even if nothing exsits.

.* alone will not make any sense. So here is an example case:

1234//
134
23456//
2342//
134243

Suppose, we want to match all the cases above. If we use the regex \d+\/+, it will only match the strings ending with //. However, if we use the regex \d+\/* it will match all the cases. This because \/* makes / character optional (0 or more).

Note: There are better ways to match the above case, the only purpose of the example above is to illustrate the difference between + and *.

(https?)://([\w.-]+)/?(.+) is not working because, (.+) is expecting some characters (pages) after the domain name. However, in some cases like http://www.interactivedynamicvideo.com/, the webpage part does not exist, so ideally it should not match.

However, in the result, we can see that:

Group 0 contains: http
Group 1 contains: www.interactivedynamicvideo.com
Group 2 contains: /

It should not have matched based on the above discussion, then why does Group 2 has /? / is not even inside the third group ().

This weird behavior is due to operator precedence in regex. Grouping () has a higher precedence than ? character.

Therefore, in http://www.interactivedynamicvideo.com/ first, http will be grouped, after that, www.interactivedynamicvideo.com is grouped and lastly / is grouped because .+ will not calm down until it gets at least one character. And after all those groupings, /? is used to check whether the URL has an optional / character after the domain name (www.interactivedynamicvideo.com).

On the other hand, if we use (https?)://([\w.-]+)/?(.*), / at the end will not be added to the last group because .* is not picky like .+. It will match even if nothing is there to match.

Hope this helps :slightly_smiling_face:.

Best,
Sahil

1 Like