Hello,
I loved the mission so far. Just stuck at the final interpretations of the p value of each term.
I would like to understand what this result finally means and what the p value for each word means.

For example, the solution notebook concludes that “it seems that the term “indian” is more associated with high value questions.” But here we can see that the term ‘school’ has appeared more times in high value questions. Here , my question is

Why the solution notebook concluded that the term ‘indian’ is more associated with high value question ?

What does the p value for the terms mean, like what does the p value mean for the terms ‘indian’ and ‘school’ ?

I am attaching a screenshot of the final tibble here.

I’m basing my answer off of what I think your column names are referring to. n_high and n_low tell me that you’re referring to the last screen.

In general, a p-value here would be interpreted something along the lines of "If there was actually no difference between between the high-value and low-value questions in terms of using that term (ie indian), then the probability of seeing such a big a difference in term use (266 vs 288) is 0.05%.

You can think of p-value as a sort of “false positive” probability. If someone were to tell you that high-value questions and low-value questions truly use the word “indian” at the same rate, then your p-value is the probability that your data (ie the Jeorpardy data you use) supports this (null) hypothesis.

In this case, the p-value very low, so it suggests that there is actually a difference in the rate of usage of the word “indian” between the high- and low-value questions. Not that the word is associated more with one or the other.

Hello Christian, i really appreciate your response. I think i am close to understand the problem. As the p value for the term indian, is less than 0.5, we shall reject the null hypothesis that there is no difference of appearance rate of the term ‘indian’ between high value and low value questions.

But one thing i am unclear of. The solution notebook says at the end, “From the 20 terms that we looked at, it seems that the term ‘indian’ is more associated with high value questions, interesting!”

Would you please explain what this interpretation means here ? Does it mean that the term ‘indian’ has more chance of appearing in high value questions than other terms ? If so, i am really looking forward to know why because the n_low is greater than n_high for this term. Also, i want to understand why the author has written “interesting” at the end of the interpretation in the solution notebook.

hi @diminishstudioz
This is a very delayed response, but in April I hadn’t started on the Stats part in my DS path with Python If you have already resolved your doubts Great! please ignore the post.

@alegiraldo666 has also asked about interpretation of the results here to which I have responded. If this helps you cool, if not let me know if there’s something I can help you with.