I am a little confused as to when and why to use this syntax and what the difference is between them. In Regular Expressions lesson we jump between these two a few times:
java_titles = titles[titles.str.contains(pattern)]
ending_count = titles.str.contains(pattern_ending).sum()
Any help / info as to where I can learn more on this would be helpful.
Welcome to DQ Community @bennettbrown
Not sure which track or course you are engaged with.
As part of the DQ course, the syntax has been explained in these two missions:
Boolean Indexing with NumPy
Exploring Data with pandas: Fundamentals
titles.str.contains(pattern) this code matches the pattern with each row in series and gives a boolean value - True if matched and False if not matched.
try this to see the results:
print(titles.str.contains(pattern_ending)) on the console.
titles[titles.str.contains(pattern)] filters out only those rows where the pattern matched i.e. where
titles.str.contains(pattern) == True
titles.str.contains(pattern_ending).sum() this code sums the rows where the pattern matched. True is considered as 1 and False is considered as 0. so sum() will result in count of rows where pattern matched.
You may also refer to a similar post here there is a detailed explanation about the syntax.
Hope this helps you somewhat. please do reply if you have an issue with any part of the post.
And for future queries , please attach a link to the screen you are working with and facing an issue. It helps the other members to help you better.