NLP is not my current forte to be able to help extensively. But from what I understand -
Step 1
Let’s say you have the following two sentences in your input -
Hi, I am the doctor.
Hi doctor. I am monorienaghogho
So, to calculate the p(w_i) for every i (where i is every word in the data) you simply have to find the frequency of the word divided by the total number of unique words.
So, for example, probability of doctor
would be 2/10 (ignoring all punctuation).
Step 2
Now, for each sentence, you calculate a weight that is equal to the average probability of the words in that sentence.
So, for our first sentence - Hi, I am the doctor.
we would have the following probability of each word -
2/10, 2/10, 2/10, 1/10, 2/10
So the average of the above is = (9/10/5) = (0.9/5) = 0.18
And the process continues based off of the above two.
As per the paper, it seems it’s across all documents -
With regard to our system design, it must be noted that this system, similar to almost all multi-document summarization systems, produces summaries by selecting sentences from the document set, ei-ther verbatim or with some simplification.
That seems to already been accounted for by Step 2. The denominator is the count of the set of all unique words in the sentence, if I am not mistaken. And they are summing over w_i as well, which is also every unique word in the sentence.
Beyond the above, I’m afraid, any more details would have to come up through a more focused reading of the paper, which I can’t currently go through. Maybe someone else can help out as well.