Mission Link: https://app.dataquest.io/m/369/advanced-regular-expressions/8/extracting-domains-from-urls
Hi,
when my used regex pattern did not bring the expected results, I had a look at the pattern provided in the answer. Now I am confused, as the provided pattern does not cater for points, hence provides www as domains as well, which than results in the top 5 domains looking like
www 7239
github 1008
medium 829
blog 661
techcrunch 245
Using my pattern (r"https?://([-\w.\?]+)"
) would (at least in my opinion) produce a more fitting top 5 domains extracted form the hn dataset
github.com 1008
medium.com 825
www.nytimes.com 525
www.theguardian.com 248
techcrunch.com 245
Additionally, using my pattern it is possible to correctly extract all domains from test_urls, while using the pattern provided in the answer again gives results which (in my opinion) seem rather strange:
0 www
1 www
2 www
3 evonomics
4 github
5 phys
6 iot
7 www
8 beta
9 www
10 css-cursor
However, as I am fairly new to Python I might have got it all wrong
Any ideas?
Cheers!