Blue Week Special Offer | Brighten your week!
days
hours
minutes
seconds

I don't understand why negative set is not working well for the target at the end of a sentense

Screen Link: https://app.dataquest.io/m/399/regular-expression-basics/10/word-boundaries

While the negative set was effective in removing any bad matches that mention “JavaScript”, it also had the side-effect of removing any titles where Java occurs at the end of the string, like this title:

 Pippo Web framework in Java

This is because the negative set [^Ss] must match one character, so instances at the end of a string do not match.

I don’t understand here. That means if we want this sentence above to be picked, adding a space at the end of this sentence may help?

Do you mean adding a space at the end of “Pippo Web framework in Java”? That’s not how it’s supposed to work. The sentence is part of the data, you don’t want to modify the data here.

You want to find the right regular expression pattern to match whatever you want to match.

Can you be more precise? I’ll try to explain even though I don’t know what you’re confused about.

The pattern [Jj]ava[^Ss] matches anything that:

  • Starts with either J or j
  • Is followed precisely be ava
  • Follow by any character that isn’t S, nor s.

In the case of “Pippo Web framework in Java”, even though Java satisfies the first two requirements, it doesn’t satisfy the third because it isn’t followed by a character different from S and s; it isn’t followed by anything, that’s why it doesn’t match.

1 Like

In the case of “Pippo Web framework in Java”, even though Java satisfies the first two requirements, it doesn’t satisfy the third because it isn’t followed by a character different from S and s ; it isn’t followed by anything, that’s why it doesn’t match.

Thank you for the elaboration! I am wondering what if Java in “Pippo Web framework in Java” were followed by a space or a period (.), will this regular expression [Jj]ava[^Ss] able to match the Java + space or Java + period (.) here?

Try it out.