Act fast, special offers end soon! Up to $294 is savings when you get Premium today.
Get offer codes

Answer Code Incorrect

Screen Link: https://app.dataquest.io/m/354/regular-expression-basics/8/negative-character-classes

My Code:

def first_10_matches(pattern):
    """
    Return the first 10 story titles that match
    the provided regular expression
    """
    all_matches = titles[titles.str.contains(pattern)]
    first_10 = all_matches.head(10)
    return first_10

first_10_matches(r'[Jj]ava\W')
java_titles = titles[titles.str.contains(r'[Jj]ava\W')]

What I expected to happen: I expected it to work and it did work.

What actually happened: The console displayed an error and I assumed my code was wrong. However, after comparing expected output with my own I could see that the expected output contained several instances of ‘javascript’, of which, the lesson was teaching us to filter and remove.
I’m not confident that my code is right but the regex leading to the expected output is definitely incorrect and it would be great if someone could check it out.

java_titles is shorter than we expected.

Hello,
I have checked out the screen and found the problem. The thing is, the pattern you have written is r’[Jj]ava\W’. Here’s what’s wrong with it:

  • There is no capture group specified. You should specify your capture group in regular brackets: ().
  • The instructions state The regex shouldn't match where 'Java' is followed by the letter 'S' or 's'. and I understand how you tried to approach this problem. However, the “\W” part matches only the characters that are non-word characters. This means you have also omitted the "Java"s that were followed by any word. Instead, you can use the negated set clause with characters “S” and “s” to avoid the word “Javascript”. The negated set goes as the follows: [^Ss].

Therefore, the pattern in the answer is actually right. What I’d like to recommend you is this website. It was really hard for me to get used to writing regex patterns, as well. But after some little practice, you’ll gain intuition very quickly.

1 Like

Hi, thanks for the quick response.
My post is not trying to argue that my answer is correct and I do understand why it is incorrect but thankyou for the detailed response.

My issue is that both my code and the solution code did not work correctly.
Please look at this attached screenshot and you will be able to see that my code (on the left) and the solution (on the right) both resulted in an instance of JavaScript being in the final output.

As I understand, the intent of the lesson was to end up with a filtered dataset containing no strings with JavaScript in it so I am reporting this as a bug.

1 Like

Oh, I understand your issue better now. Sorry if I sounded like I assumed you were trying to argue that btw, I didn’t mean to. :innocent: I have just checked out the variable inspector for my solution, as well, and I agree that this should be reported as a bug.

Hello.

The solution is not incorrect. Here is the full sentence:

‘Ask HN: Should Learn/switch to JavaScript Programming (a Java Developer)’

This item is selected because of the word Java not because of the word JavaScript.

If you run the solution code followed by java_titles.loc[8984] you can see it by yourself.

Solution code:

def first_10_matches(pattern):
    """
    Return the first 10 story titles that match
    the provided regular expression
    """
    all_matches = titles[titles.str.contains(pattern)]
    first_10 = all_matches.head(10)
    return first_10
pattern = r"[Jj]ava[^Ss]"
java_titles = titles[titles.str.contains(pattern)]
3 Likes

Ah okay, thankyou for the reply. I will make sure to do that in future.

1 Like

Thanks for the clarification!

1 Like

How about the index 8135; i.e., java_titles.loc[8135]? There’s no “Java”/“java” in that title but it captured the word “javacript”. Misspellings of “javascript” can affect the accuracy of the matches. Hence, I think the pattern, r"[Jj]ava[^Ss]", is not the optimal choice if our goal is to capture the word “Java”/“java” while excluding the term “Javascript”/“javascript”. In this case, perhaps r"[Jj]ava[^Sscript]" is better?

1 Like

Hello @lorelynr!

This happens because the code in this screen is not meant to consider typos. In fact, if you look at the instruction, it doesn’t even mention that you should differentiate Java and Javascript, it just says “The regex shouldn’t match where ‘Java’ is followed by the letter ‘S’ or ‘s’.”

But I get your point and I think it makes sense. In a real-life situation, we should consider this when writing the regex or at least we should be aware that some cleaning could be needed even after this step.

I kindly suggest you use the Contact Us button on the top of this page to share your thoughts with the DQ team as they will be able to explain their ideas behind this particular scree or even to accept your suggestion.

2 Likes

Hi @otavios.s,

Thank you for your response. It’s just that the the last statement in the lecture (" Let’s use the negative set [^Ss] to exclude instances like JavaScript and Javascript :") isn’t clear enough whether to capture just the word “Java”/“java” without any other post-substrings or accept other Java-related words (e.g., JavaOne, JavaFx, etc.) while excluding the word “[Jj]ava[Ss]cript”. I commented on this topic to raise awareness about other possible cases like misspellings and I hadn’t studied the following screens by the time I wrote my comment. Today, I found that there’s more cleaning/corrections to be done in the following screens. And, since the goal of that lecture is to teach us how to use “negative character classes”, the practice activity did help me understand the lesson. Thank you very much!

2 Likes