pattern = r"\b[Jj]ava\b"
java_titles = titles[titles.str.contains(pattern)]
test = titles[titles.str.contains(r"using\?")]
test1 = titles[titles.str.contains(r"using\?\b")]
test2 = titles[titles.str.contains(r"\busing\?\b")]
Ask HN: Which linux/unix C++/C IDE are you using?
Ask HN: Moving Out of Silicon Valley because of housing? Where to?
- I thought that test and test1 would have the same result. Why aren’t titles and titles included in test1?
- Why doesn’t test2 include titles? In this lesson, we were able to get the titles with the word Java at the end of the string. I tried to apply that understanding to test2, but I’m not sure where I went wrong.
Thank you in advance!
I would first recommend going through the Accessing the Matching Text with Capture Groups Screen again and understanding what
r (raw string) really does. Then compare that to your pattern - what does
a word boundary is a position between a word character (
[0-9A-Za-z_] ) and a non-word character (
\W), or beginning or end of word character.
\? is a non-word character.
So my understanding right now is that the raw string will prevent Python’s escape sequences (
\b for word boundary instead of backspace), but now I’m not sure what
\? does if
r prevents it from using Python’s escape sequence of a question mark. My initial thought was that maybe
\? becomes an optional backslash, but running code rejects that idea.
I also tried the pattern of
\\busing\?\\b on a little test data that contains the string
"test using? test" to see if I would get the result I expect, but I didn’t, and I’m not sure how to go about it next. I’m a little stumped.
Adding to my previous comment
\busing\?\b won’t work since
\? is a non word character.
\busing\b\? would work since the boundary is starting and ending at a word character
Ohhh. I understand the definition now. Thank you!