Reviewing some basic probability and statistics, I rediscovered the geometric distribution, which tells you the likelihood of winning a game after some number of attempts. With it, we can figure out the number of times you should expect to play the game until the outcome is “success” – or, equivalently, the number of plays you’d need in order to expect one success among them.

Why is this important? They say the odds of winning the current record-setting powerball lottery are 1 in 176 million. I want to be rich. How many tickets do I need to buy to expect a winning one?

The intuition is straightforward. For instance, if there is a p=25% chance of success on any one trial, the expected ratio of success to failures is 1/4, or one out of every four. Which means that any four trials should yield one success on average. Mathematically, if we let X be the number of trials required until a success occurs, then E[X] = 1/p.

So the odds are in my favor after buying 176 million randomly generated tickets. Who wants to loan me $176,000,000?

The proof is neat.

If p is the probability of success, and q = (1-p) is the probability of failure, then the probability of succeeding on the i’th trial after (i-1) failures is:

Pr(X=i) = q^(i-1)p

So

1 2 3 4 5 6 |
E[X] = Sum over i of i*Pr(i) = p + 2pq + 3pq^2 ... = p(1 + 2q + 3q^2 + ...) = p/(1-q)^2 = p/p^2 = 1/p |

Lines three to four depend on some magic with infinite series. If you have a converging series where x < 1:

1 2 3 4 5 6 7 |
1 + x + x^2 + ... = S (1 - x)(1 + x + x^2 + ...) = (1 - x)S (1 + x + x^2 + ...) - (x + x^2 + ...) = (1 - x)S 1 = (1 - x)S 1/(1-x) = S 1/(1-x) = 1 + x + x^2 + ... 1/(1-x)^2 = 1 + 2x + 3x^2 + ... (taking the derivative of both sides) |

We use this last equation in the derivation above. So what else can we do with this? Suppose we want to know how many rolls of a die are needed to see all six sides come up at least once (given a six-sided, fair die, where each role is an independent event). On the first roll, the probability of getting a side we haven't seen is 1. The next role then has a 5/6 probability of yielding a unique side; and then, after the next unique number comes up, the probability becomes 4/6 ... and so on. So, 1 + 1/(5/6) + 1/(4/6) + ... = 14.7 Of course, I distrust math: I need evidence. And so Python to the rescue:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import numpy as np from pandas import Series def reroll(): return np.random.randint(6) class Trial: def __init__(self): self._count = 0 self._seen = {} def run(self): while len(self._seen) < 6: self._count += 1 roll = reroll() self._seen[roll] = roll def count(self): return self._count trial_count = 10000 trials = np.empty(trial_count, dtype='i4') for i in range(trial_count): trial = Trial() trial.run() trials[i] = trial.count() s = Series(trials) print s.describe() |

=>

1 2 3 4 5 6 7 8 |
count 10000.000000 mean 14.659600 std 6.218472 min 6.000000 25% 10.000000 50% 13.000000 75% 18.000000 max 58.000000 |