This can be tricky to understand. So, let’s use numbers for a sample problem.
Let’s say you are given set of lists -
[1, 2, 3, 4]
[1, 2, 4]
[1, 2, 3]
You are trying to find proportion of A that appears in B. This is the same as the
count_matches function in the task.
For first row, we have both numbers in A appear in B. So, our proportion is 1
For second row, the proportion would be 0
For third row the proportion would be 0.5
Take some time to make sure you understand how the above values came to be. It’s essentially the two steps below -
- Loop through each item in
split_answer , and see if it occurs in
split_question . If it does, add
match_count by the length of
split_answer , and return the result.
So, we have our proportions. For the third row, based on our 0.5 proportion we can say that 50% of A occurs in B (for just that row).
On average, how much of the values in A occur in B (considering all the rows)?
That would be taking the average of the proportions. So,
(1 + 0 + 0.5)/3 = 0.5
On average, we can say that, 50% of a list in A occurs in B. This is essentially what they refer to with the
6% value. On average, 6% of the answer is present in the question.
Now, coming to your approach. Our proportions are -
1, 0, 0.5
Total number of values from above that are not 0 = 2
Total number of values = 3
Average number of values that are not 0 = 2/3
Do you notice the difference?
You are calculating the average number of times A is present in B. That is, average number of times an answer is present in a question.
What is required is calculating the average of how much of A is in B. That is, on average, how much of an answer is in the question. And this is what’s important in the context of the project. It helps answer -
How often the answer is deducible from the question.
We can say from our numerical data that 2 lists in A out of 3 are present in B. But that doesn’t help us answer how much of those lists in A (that is what percentage of values in the lists in A) are present in B.