Which phrase to return in case of multiple matches?

I want to compare one sentence to some other sentences using the Bag of Words model. Suppose that my comparing sentence is:

I am playing football

and there are three more sentences that I want to compare my comparing sentence with. They are:

1. and I am playing Cricket

2. Why do you play Cricket

3. I love playing Cricket when I am at school

Now, if I compare my comparing sentence to the above three sentences by counting words, the number 1 and number 2 sentences have the same number of words that the comparing sentence has. and that is 3 (I, am , playing).

Now the question is, Which sentence is more related to my comparing sentence in this case? there are no semantic meanings involved at all.

In some places I saw, they say, it is less convoluted to return the shortest sentence in this case. What are your thoughts?

Thank You.

Hi @hefaz2010 welcome to the community!

Not very familiar with bag of words myself but this article should help.

Scoring Words

Once a vocabulary has been chosen, the occurrence of words in example documents needs to be scored.

In the worked example, we have already seen one very simple approach to scoring: a binary scoring of the presence or absence of words.

Some additional simple scoring methods include:

  • Counts. Count the number of times each word appears in a document.
  • Frequencies. Calculate the frequency that each word appears in a document out of all the words in the document.

Thank you.

I have already implemented that algorithm. in my algorithm, one sentence is compared to multiple sentences and it returns the one with most matched words. Now the problem is that if I compare one sentence to multiple sentences and let us say, that there are two sentences which have the same words as my comparing sentence. then which one should I return? Please refer to the question for the example.