I have trouble in understanding the capture group concept here. Thank you for helping in advance!!
when i do
pattern="\[\w+\]"
tag_titles=titles[titles.str.contains(pattern)]
, it returns the whole title line which contains the pattern. If I add ()
to the pattern, return is the same, pattern="(\[\w+\])"
tag_titles=titles[titles.str.contains(pattern)]
so what does ()
really do? capture what group here?
When I try to extract the pattern from series, pattern = "\[\w+\]"
tag=titles.str.extract(pattern)
, error shows ValueError: pattern contains no capture groups
When I try to do pattern = "(\[\w+\])" tag=titles.str.extract(pattern)
, It returns:
tag_freqSeries (<class ‘pandas.core.series.Series’>)
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
…
20094 NaN
20095 NaN
20096 NaN
20097 NaN
20098 NaN
Name: title, Length: 20099, dtype: object
When I do pattern = "(\[\w+\])" tag_freq=titles.str.extract(pattern).value_counts()
, it returns
title
[pdf] 276
[video] 111
[audio] 3
if i move the ()
inside like pattern = "\[(\w+)\]
, then return tag name without []
, to me, i did not see capture group function, it seems that ()
only affect how the pattern shows?
`