Hi i know my question is similar to threads
i tried this
(?<=//)[\w+.]+
in https://regexr.com which seems to work fine in extracting
‘https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429’,
‘http://www.interactivedynamicvideo.com/’,
‘http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0’,
‘http://evonomics.com/advertising-cannot-maintain-internet-heres-solution/’,
‘HTTPS://github.com/keppel/pinn’,
‘Http://phys.org/news/2015-09-scale-solar-youve.html’,
‘https://iot.seeed.cc’,
‘http://www.bfilipek.com/2016/04/custom-deleters-for-c-smart-pointers.html’,
‘http://beta.crowdfireapp.com/?beta=agnipath’,
‘https://www.valid.ly?param’
however, when i run it in Jupyter notebook, i get
ValueError Traceback (most recent call last)
in
----> 1 test_urls_clean = test_urls.str.extract(pattern_url, expand=False)
~/anaconda/lib/python3.6/site-packages/pandas/core/strings.py in extract(self, pat, flags, expand)
2765 @copy(str_extract)
2766 def extract(self, pat, flags=0, expand=True):
-> 2767 return str_extract(self, pat, flags=flags, expand=expand)
2768
2769 @copy(str_extractall)
~/anaconda/lib/python3.6/site-packages/pandas/core/strings.py in str_extract(arr, pat, flags, expand)
848 return _str_extract_frame(arr._orig, pat, flags=flags)
849 else:
–> 850 result, name = _str_extract_noexpand(arr._parent, pat, flags=flags)
851 return arr._wrap_result(result, name=name, expand=expand)
852
~/anaconda/lib/python3.6/site-packages/pandas/core/strings.py in _str_extract_noexpand(arr, pat, flags)
711
712 regex = re.compile(pat, flags=flags)
–> 713 groups_or_na = _groups_or_na_fun(regex)
714
715 if regex.groups == 1:
~/anaconda/lib/python3.6/site-packages/pandas/core/strings.py in _groups_or_na_fun(regex)
686 “”“Used in both extract_noexpand and extract_frame”""
687 if regex.groups == 0:
–> 688 raise ValueError(“pattern contains no capture groups”)
689 empty_row = [np.nan] * regex.groups
690
ValueError: pattern contains no capture groups
Can someone walk through with me what is wrong in my regex pattern here