Blue Week Special Offer | Brighten your week!
days
hours
minutes
seconds

How often the answer is deducible from the question

I am not sure I understand the below code well. We remove the from split_answer series, just because it is not important world… but how about is,a,and,etc ? These worlds are not important either, so why we only remove the? if there are too many to clean up, then why we bother removing only one…

I can’t run the below code. Error showed. but I can’t tell what is the problem. Could you help? Thank you!!
Screen Link: https://app.dataquest.io/m/210/guided-project%3A-winning-jeopardy/4/answers-in-questions

My Code:

def count_match(row):
    split_answer = row['clean_answer'].split()
    split_question = row['clean_question'].split()
    match_count=0
    if 'the' in split_answer:
        split_answer.remove('the')
    if len(split_answer)==0:
        return 0
    for item in split_answer:
         if item in split_question:
                match_count +=1
    return match_count/len(split_answer)

count_match(df)

What actually happened:

AttributeError: 'Series' object has no attribute 'split'

Hi @candiceliu93,

Given that it’s a guided project, I think this the is just taken as an example on that screen. On the last screen, you will definitely be asked to expand your analysis and to do the same for other low important words :blush:

As for your function: it uses a row as an argument, not the whole dataframe. You can use this function on the whole dataframe with apply() method, specifying that you are interested only in rows (assigning axis='columns', or axis=1).

1 Like

Thank you! We did not specify axis before when using apply() and it worked. I get confused when should we specify?

1 Like

By definition, if we don’t specify it differently, in the apply() method this argument axis is equal to 0, whicn means the function will be applied to each column. In case we want our function to be applied “horizontally”, instead (i.e., to each row), like in given case, we have to specify axis=1.