Guided Project: Mobile App for Lottery Addiction - Definition of probability_less_6

On the 7th step of the Guided Project (check link), we are asked to write a function that determines the probability of having n winning numbers.

The approach on the GitHub solution page (https://github.com/dataquestio/solutions/blob/master/Mission382Solutions.ipynb) included the following lines of code:

def probability_less_6(n_winning_numbers):

    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining

    n_combinations_total = combinations(49, 6)    
    probability = successful_outcomes / n_combinations_total

    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)  

I don’t see how it makes sense to use 43 on the line of code:
n_combinations_remaining = combinations(43, 6 - n_winning_numbers)

When, for example, we are looking at 2 winning numbers, after having 2 numbers, we are left with combinations of 49-2, choose 6-2.

I would expect the solution in that line of code to be:
n_combinations_remaining = combinations(49 - n_winning_numbers, 6 - n_winning_numbers) minus all cases where we have more than n winning numbers.

Why is it that in the GitHub solution the fixed number 43 is chosen?

Is the solution really correct? If so, what am I missing?

Thanks in advance!! :slight_smile:

1 Like

Hey, Juli.

Answer

You want \displaystyle {43\choose {6-2}} because you want the remaining four numbers to be incorrect; 43 here comes from removing the six winning numbers from the pool (49-6).

The number n_combinations_ticket gives you the number of combinations with 2 winning numbers (out of six numbers). The number n_combinations_remaining gives you the number of combinations with 4 incorrect numbers (out of the pool of incorrect numbers).

Multiplying them yields the number of combinations with two correct and four incorrect numbers.

This is fine as well, but you’re not providing a way of counting the cases where we have more than n_winning_numbers and that’s critical here.

An example

Let’s see a smaller example to better train our intuition. We are going to use a brute force approach; to assist us with this we’ll use the itertools.combinations function.

This function takes as inputs an iterable (like a list) and positive integer k, and it returns all the combinations of size k in the form of a generator (don’t worry if you don’t know what this is, it’s not important here).

Let’s see an example. We’re going to find all the two-element combinations from the list [0, "Bruno", 17].

>>> import itertools as itt
>>> list(itt.combinations([0, "Bruno", 17], 2))
[(0, 'Bruno'), (0, 17), ('Bruno', 17)] 

We can eye-ball it and confirm that itt.combinations is working as expected.

We’ll now create our own mini-lottery with a pool of six numbers, 1 through 6, where each ticket consists of a three distinct numbers gamble. Players get prizes for getting one, two and three numbers correct.

Let’s list all possible plays:

>>> all_tickets = list(itt.combinations(range(1,7), 3))
>>> len(all_tickets)
20
>>> print(*all_tickets, sep="\n")
(1, 2, 3)
(1, 2, 4)
(1, 2, 5)
(1, 2, 6)
(1, 3, 4)
(1, 3, 5)
(1, 3, 6)
(1, 4, 5)
(1, 4, 6)
(1, 5, 6)
(2, 3, 4)
(2, 3, 5)
(2, 3, 6)
(2, 4, 5)
(2, 4, 6)
(2, 5, 6)
(3, 4, 5)
(3, 4, 6)
(3, 5, 6)
(4, 5, 6)

Now let’s consider a winning ticket. I’ll “randomly” pick the winning ticket: (2,4,6).

Now let’s figure out what is the probability of getting exactly two numbers right, for example. We already know the denominator is 20. Since the example is so small, we can just visually count the tickets that satisfy this criterion and find the numerator that way. We’ll use a little trick as a sanity-check:

>>> [t for t in all_combs if len([k for k in t if k%2 == 0]) == 2]
[(1, 2, 4), (1, 2, 6), (1, 4, 6), (2, 3, 4), (2, 3, 6), (2, 4, 5), (2, 5, 6), (3, 4, 6), (4, 5, 6)]

This is a list of all combinations that have exactly two even numbers. Since our “randomly” picked winning ticket consists of all the even numbers, this enough. Counting we see that there nine combinations with exactly two winning numbers.

Thus, the probability of getting exactly two winning numbers in our mini-lottery is \dfrac 9{20}.

>>> 9/20
0.45

Let’s confirm this using an adapted version of probability_less_3:

Expland to access a copiable version of the function's definition
def probability_less_3(n_winning_numbers):
    
    #Replace 6 with 3 (the numbers that are drawn)
    n_combinations_ticket = combinations(3, n_winning_numbers)
    #Replace 43 with 6-3 (the remaining incorrect numbers)
    n_combinations_remaining = combinations(3, 3 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    #Replace 49 with 6 (the pool) and 6 with 3 (the drawn numbers)
    n_combinations_total = combinations(6, 3)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage, int(combinations_simplified)))
>>> def probability_less_3(n_winning_numbers):
...     
...     #Replace 6 with 3 (the numbers that are drawn)
...     n_combinations_ticket = combinations(3, n_winning_numbers)
...     #Replace 43 with 6-3 (the remaining incorrect numbers)
...     n_combinations_remaining = combinations(3, 3 - n_winning_numbers)
...     successful_outcomes = n_combinations_ticket * n_combinations_remaining
...     #Replace 49 with 6 (the pool) and 6 with 3 (the drawn numbers)
...     n_combinations_total = combinations(6, 3)    
...     probability = successful_outcomes / n_combinations_total
...     
...     probability_percentage = probability * 100    
...     combinations_simplified = round(n_combinations_total/successful_outcomes)
...     print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
... In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage, int(combinations_simplified)))
... 
>>> probability_less_3(2)
Your chances of having 2 winning numbers with this ticket are 45.000000%.
In other words, you have a 1 in 2 chances to win.

The result matches what we got above.

Let’s now explore what n_combinations_ticket and n_combinations_remaining refer to in this example. We’ll start with the former:

>>> list(itt.combinations((2,4,6),2))
[(2, 4), (2, 6), (4, 6)]

The number n_combinations_ticket is nothing more than the number of elements in the above list. This matches what is being done in the function, as \displaystyle{3\choose 2} = \dfrac{3!}{2!1!} = 3.

For each element in list above, to get a ticket we must include one more number. To make it so that we don’t include another winning number, the available pool of numbers is 1,3 and 5. Therefore, we can add:

  • 1, 3 or 5 to (2,4), yielding three distinct possibilities;
  • 1, 3 or 5 to (2,6), yielding three distinct possibilities;
  • 1, 3 or 5 to (4,6), yielding three distinct possibilities;

In total we have nine ways of creating a ticket with exactly two winning numbers. What we just did was obtain n_combinations_remaining manually.

From the remaining three numbers (1,3, and 5), we chose one. This corresponds to combinations(3,1) in n_combinations_remaining = combinations(3, 3 - n_winning_numbers).

I hope this helps.

6 Likes

Thank you so much Bruno! I’m convinced now. :smile:

Just for sanity checking, where you wrote:

“Thus, the probability of getting exactly two winning numbers in our mini-lottery is 9/45.”

you meant “9/20”, right?

Again, thank you so much for your thorough answer. It’s now totally clear for me.

:relieved:

1 Like

You’re welcome. I’m glad I could help.

Yes! Thank you for catching that typo.

1 Like

Thank you for the explanation. I was also quite confused with the usage of 43 for the calculation of n_combinations_remaining. May be you can try to include an explanation in the main course itself that will help the students to understand this better.

hi @bruno

I am parking my learnings, to answer a question in this detailed and informative way for another day.

For now, I still don’t understand this part:

In most simple terms, this is what I have understood:

  • out of 6 available numbers, I have already drawn 3, so now the box only has 3 numbers left.
  • similarly, 6 numbers have been drawn already, so now the box contains 49-6 = 43 numbers.

So even if user inputs, n_winning_numbers = 4 I am still looking at numbers 7 to 49 for the 5th and 6th position numbers?

I am lost at basically 49 - 4 = 45 when I draw the first 4 numbers [1, 2, 3, 4], then I draw 2 more numbers which turn out to be [5,6]. (but now they are from left 45 numbers).

Since the precondition here is [1, 2, 3, 4, 5, 6] is THE winning combination. So the other possible outcomes should be like:
[1, 2, 3, 4, 5, 7], [1, 2, 3, 4, 5, 8],…,[1, 2, 3, 4, 5, 48], [1, 2, 3, 4, 5, 49],…,
[1, 2, 3, 4, 6, 7], [1, 2, 3, 4, 6, 8],…,[1, 2, 3, 4, 6, 48], [1, 2, 3, 4, 6, 49],…,

or am I colosally wrong here :frowning:

I am more confused now. :frowning_face:

Sorry I’m not able to pay you back immediately for all the help you’ve been giving, but I’m having a real hard time understanding most of your question.

One thing I picked up on, is that you seem to be thinking under the assumption that (1, 2, 3, 4, 5, 6 ) is the winning combination; it isn’t, what this is is the gamble. We don’t know what the winning combination is.

Was this clear to you and do you maintain your question or is this new information that resolves your question?

hey @Bruno

for now, your payback gesture will do :heavy_heart_exclamation: (Thank you). Once the planet-wide lockdown is over we will discuss “Pastel de Nata” (or any other dish where meat can be avoided or substituted!) :stuck_out_tongue:

I re-read the instructions and your solution as well. Let me try to rephrase my question/understanding.

  • we need to find the probability - if a minimum of 4 numbers out of 6 would match in a given ticket.
  • we first calculate possible combinations of 4 numbers from the 6 numbers:
    n_combinations_ticket = combinations(6, n_winning_numbers)
  • since we are matching any 4 numbers of the ticket, winning ticket takes the form [1, 2, 3, 4, x, y]
  • one of the possible outcomes is [1,2,3,4, x = 5, y = 6], we need to identify how many other possible combinations can happen. This is where n_combinations_remaining = combinations(43, 6 - n_winning_numbers) this comes into play.

Here 43 (instead of 45 (49 -4)), because we are already taking 5 and 6 out of the calculation as one of the possible outcomes and we are trying to find how many other values can x and y can take!?

  • Total number of successful outcomes comes from n_combinations_ticket * n_combinations_remaining

Is my thought process correct or I am again awfully wrong. Apologies if I am still not making sense. :grimacing:

:relaxed:

No, not a minimum of 4 numbers. It’s exactly 4 numbers.

More precisely combinations(6, 4).

Nope. You don’t know what the winning ticket is, (1, 2, 3, 4, 5, 6) is your ticket.

Ignoring the issue above, what about [x, y, 3, 4, 5, 6], [x, 2, y, 4, 5, 6] and all other possible combinations?

Correct.


Sorry I’m just answering your questions, I’m just not getting the gist of your rationale. Therefore I’m unable to see what is it that you’re thinking incorrectly.

hey @Bruno

I re-read the entire instruction and thought differently.

I guess I went on a totally different tangent than what was asked and required.
But I guess this was my misunderstanding - [1, 2, 3, 4, 5, 6] are the exact numbers on the ticket. I understood them as the [1st, 2nd, 3rd ...] numbers chosen.
(I am still not sure what exactly was in my head at that time.)

Moving on, I thought of the question this way, I have a ticket with exact number-combination [3, 19, 27, 23, 41, 35]. I need to identify the probability that out of these 6 numbers, any 4 or any 3 etc. numbers match with the winning ticket.

Like if winning combination is [1, 11, 19, 27, 23, 6] - I have exactly 3 numbers that match.
If winning combination is [10, 21, 35, 41, 23, 3] - I have exactly 4 numbers that match.

So no matter what the winning combination is, I will always have 43 numbers left to Choose from and I will choose the remainder numbers, after I have taken into account how many numbers I wish to exactly match.

So,

  • exactly 4 numbers should match, possible outcomes would be nCk = C(43, (6 - 4))
  • exactly 3 numbers should match, possible outcomes would be nCk = C(43, (6 - 3)) etc.

I hope I finally made some sense this time!

(Maybe the core of my question was - why for every calculation we are taking n = 43 and it’s not changing along with the exact numbers to match, n = 45 when exact match = 4, n = 46 when exact match = 3 :woman_facepalming: )

Any future employers out there, please don’t disqualify me based on my query, please take a look at my persistence and perseverance to Bug @Bruno! :grin:

1 Like

Hi all,

After re-read the instruction and forum, I finally understand the idea behind the code.
I would like share my comprehension with visual tool.

Let’s imagine that we have a big pool that contains smaller pools, pool A and pool B so big pool = A + B. The big pool contains 49 number. The pool A is the 6 winning number and pool B is the non-winning numbers (49 - 6).
In case your ticket match two, three, four or five numbers, your ticket have numbers from pool A and pool B. That’s why when you win exactly five-number, for example, one complement number will be any 43 number from pool B.

lottery 649

I still do not get it. Could somebody explain me the code by using n_winning_numbers=3. Because I understand it with 5, but I do not see how the same code applies for lower numbers. Basically, I do not understand how n_combinations_remaining = combinations(43, 6 - n_winning_numbers) represents the combinations of 3 good numbers minus the hits of more than 3 numbers.
Thanks in advance

I tried writing a loop that calculates the probability for decrease n from 5 to 2. Like this, I can remove the possible combinations where we have more than n winning number. But I get different results from the solution, and I do not know where I am wrong. I included two new variables: posib_outs_per_comb (how many 6 numbers we can have per each n_combinations_ticket), and comb_more_than_n (how many of those combinations belong to hits of more than n). I am extremely confused :frowning:

My code:
n_combinations_total = combinations(49,6)
n_winning_numbers=5
comb_more_than_n=1
probabilities=
while n_winning_numbers>1:
n_combinations_ticket = combinations(6,n) # in 6 numbers, how many groups of n number can it be
posib_outs_per_comb = combinations(49-n, 6-n) # For each combination of n, the other number can be any combination of the remaining numbers (49-n)
# How namy hits of more than n we can have, per comb?
n_combinations_remaining = posib_outs_per_comb - comb_more_than_n # We need to remove cases when more than n values match, even if I do not agree
successful_outcomes = n_combinations_ticket*n_combinations_remaining
probability = (successful_outcomes/n_combinations_total)*100
comb_more_than_n=successful_outcomes
probabilities.append(probability)
n_winning_numbers-=1