Reviewing some basic probability and statistics, I rediscovered the geometric distribution, which tells you the likelihood of winning a game after some number of attempts. With it, we can figure out the number of times you should expect to play the game until the outcome is “success” – or, equivalently, the number of plays you’d need in order to expect one success among them.

Why is this important? They say the odds of winning the current record-setting powerball lottery are 1 in 176 million. I want to be rich. How many tickets do I need to buy to expect a winning one?

The intuition is straightforward. For instance, if there is a p=25% chance of success on any one trial, the expected ratio of success to failures is 1/4, or one out of every four. Which means that any four trials should yield one success on average. Mathematically, if we let X be the number of trials required until a success occurs, then E[X] = 1/p.

So the odds are in my favor after buying 176 million randomly generated tickets. Who wants to loan me $176,000,000?

The proof is neat.

If p is the probability of success, and q = (1-p) is the probability of failure, then the probability of succeeding on the i’th trial after (i-1) failures is:

Pr(X=i) = q^(i-1)p


Lines three to four depend on some magic with infinite series. If you have a converging series where x < 1:

We use this last equation in the derivation above. So what else can we do with this? Suppose we want to know how many rolls of a die are needed to see all six sides come up at least once (given a six-sided, fair die, where each role is an independent event). On the first roll, the probability of getting a side we haven't seen is 1. The next role then has a 5/6 probability of yielding a unique side; and then, after the next unique number comes up, the probability becomes 4/6 ... and so on. So, 1 + 1/(5/6) + 1/(4/6) + ... = 14.7 Of course, I distrust math: I need evidence. And so Python to the rescue:


After reading this post about equations every computer scientist should know, I got to thinking. I love clear, plain-English explanations of powerful mathematical concepts. Not the what, but the why. I’d like to share my understanding of the binomial formula:

This is usually stated “N choose K”, and provides a way to calculate the number of ways in which you may select (or draw) K items from a collection of N items in total, where you don’t care about the order in which you draw them.

Now, if we have N elements, there are N ways to choose a first element; and, if you don’t put that first element back, N-1 ways to choose a second; and so on. That means the total number of ways to draw N elements (counting the order in which we draw them) is N x (N-1) x … x 1. We call this N!, “N factorial”.

But, we don’t care about drawing N elements. We actually care about just K elements. We don’t care about the remaining (N-K) elements. For example, if we are trying to figure out 5 choose 2, we draw from five choices, and then from four, but that’s it. No need to worry about the other (5-2) = 3 draws. So, instead of calculating 5x4x3x2x1, we calculate just 5×4 by “removing” 3x2x1, by dividing. That is, we calculate (5x4x3x2x1)/(3x2x1), or, 5!/(5-2)!. That’s where (N-K)! in the denominator comes from.

Finally, when we draw K elements, we don’t care the order in which we draw them. Following our example, with 5 choose 2, if we calculate 5×4 = 20 choices, we double-count the cases where we choose the same two items in a different order. That means we have to “remove” (divide by) the ways in which we choose K elements in different orders. This is K! ways.

So, there’s the prize: N choose K = N! / [K!x(N-K)!]

The Euclidean Algorithm is one of the first algorithms ever written down. It is a darling of computer science. The algorithm cleverly reveals the greatest common divisor of two natural numbers. Despite its ancient origins, it is still well-used today (and not just for teaching). It can determine when two numbers are relatively prime – ie, when their greatest common divisor is one – which has important modern applications in cryptography. And it does so efficiently.

Here is a cute implementation:

def gcd(a,b):
    return a if b == 0 else gcd(b,a % b)

(NB. Error checking omitted for clarity.)

Look ma, recursion! To spell it out, when b is zero, then gcd(a,b) = a. When b is non-zero, then gcd(a,b) = gcd(b, r), where r is the remainder of a divided by b. Let’s do a short example: gcd(12, 8). We have:

gcd(12, 8) = gcd(8, 4) = gcd(4, 0) = 4

Ok, so what the heck is going on here? The gcd algorithm is deceptively simple looking. However, it leverages some rather deep properties of the natural numbers.

First, a little background. The titular Euclid, that crafty fellow, once proved something called the Fundamental Theorem of Arithmetic, which says that any number can be expressed (uniquely!) as the product of primes. For example,

12 = 2 x 2 x 3 = 4 x 3
8 = 2 x 2 x 2 = 4 x 2

You can see by inspecting the prime factors that the gcd is 4. But factoring numbers is computationally difficult, especially when they get larger. The gcd algorithm is so efficient (and remarkable) in part because it doesn’t have to do any factoring!

One useful fact is that when a number divides x and y, that number also divides x + y and x – y. You can see this by playing around with prime factors again. In the example above:

12 – 8 = 4 x 3 – 4 x 2 = 4 x (3 – 2)
12 + 8 = 4 x 3 + 4 x 2 = 4 x (3 + 2)

You can see that 4 is a factor not only of 12 and 8, but also 12 + 8, and 12 – 8. Still with me?

Ok, one other thing to get under your belt is the division theorem. It says we can come up with a unique whole number quotient q and remainder r for any natural number division a / b. That is, a = qb + r for some unique whole numbers q and r, where 0 <= r < b.

Finally, we can write out step by step what is happening in the Euclidean algorithm. We start with the arguments a and b, and put a row in the table which shows the current arguments along with what is computed (ie, the remainder a % b). We add a row for each recursive invocation.

step a b a % b
0 a = q[0] b + r[0]
1 b = q[1] r[0] + r[1]
2 r[0] = q[2] r[1] + r[2]
… etc …
n-1 r[n-3] = q[n-1] r[n-2] + r[n-1]
n r[n-2] = q[n] r[n-1] + r[n]
n+1 return r[n-1]

At each step, we pass in as arguments to the gcd function the values in columns a and b, and we compute the remainder a % b of their division and, if we want, the quotient, but we can ignore the quotient for now. Wash, rinse, repeat.

Let’s check out gcd(12, 8):

step a b a % b
0 12 = 1 8 + 4
1 8 = 2 4 + 0
2 return 4

In the nth iteration, the final remainder, r[n], is zero. Why is this? It’s because the remainder r of a division such as a = q b + r is strictly less than b, so the arguments shrink inexorably throughout the computation, and eventually you get down to zero.

In row n, where r[n] = 0, the dividend and divisor are r[n-2] and r[n-1], so we know that r[n-1] divides r[n-2] evenly (since there is no remainder to the division!) And clearly r[n-1] divides itself. So that means r[n-1] divides r[n-3] too, since it divides both terms of the sum which defines it (see row n-1). Likewise, we can work all the way back up the call stack in order to say that r[n-1] divides both the original arguments a and b as well.

Wait! We still don’t know whether that final non-zero remainder r[n-1] is the greatest among all possible common divisors. We do know that divisor r[n-1] is <= gcd, by definition of gcd. But we need to prove one more statement: ANY common divisor of a and b divides ALL the remainders r[i] of the algorithm.

ORLY? Well, just look at the procedure above and notice how the first remainder can be written r[0] = a – q[0] b. So anything that divides a and b must divide r[0]. Likewise, r[1] = b – q[1] r[0], so anything that divides b and r[0] must divide r[1]. Following the turtles all the way down the rabbit hole: any common divisor of a and b divides ALL remainders r[i].

One common divisor in particular is the gcd. Since it divides r[n-1] in particular, we can say that gcd <= r[n-1].

But if gcd <= r[n-1] and r[n-1] <= gcd, doesn’t that mean they are equal? Oh yes indeedy-do it does. Quod erat demonstratum, baby!

PS: for some visual intuition of the algorithm, check out the line-segment drawings in the top right hand corner of the Wikipedia entry.

Let’s talk about how (most) computers represent negative numbers. You know, the ones to the left of zero on the number line? The representation most computers use is called two’s complement.

Take the integer quantity fourteen, and think about how we represent it as a string of digits. In decimal, it’s 14. In binary, it’s 1110. Why? Remember that just as in base ten, where

14 = 1 * 10^1 + 4 * 10^0

We have, using a base of two,

14 = 1 * 2^3 + 1 * 2^2 + 1 * 2^1 + 0 * 2^0

But what is a good way to represent negative fourteen?

Wouldn’t it be great if a computer just understood -1110? The catch is, we can use only ones and zeros; using the minus sign “-” is out of the question.

We know that to represent all the non-negative integers up to fifteen, we need a string of four bits, which gives us a distinct string for each of the sixteen (2^4) distinct integers. If we want to represent the corresponding negative integers as well, we’ll have to use another bit. That doubles the number of distinct binary strings we have available, from sixteen to thirty-two.

Whatever representation we come up with for negative integers, it’s probably best if we don’t screw up normal binary arithmetic. We need to leave all integers from 0 to 15 mapped to 00000 to 01111 in binary just as they are. Remember in base ten arithmetic we “carry” if a column sums to over nine (e.g., 07 + 03 = 10). Similarly, in base two, we carry any column summing to over one.  So 00001 + 00001 = 00010; 00010 + 00001 = 00011; 00011 + 00001 = 00100; etc. This locks down all the numbers up to 01111.

What’s left to do is to assign binary representations to negative integers. The remaining binary strings at our disposal are the five digit strings ranging from 10000 to 11111. Note that all of them have one as their highest order bit. So adding two binary integers whose sum is greater than 01111 will definitely overflow into negative territory!

Now, how to do this mapping?

For a well-behaved system of (five-digit) binary addition with negative numbers, we turn to modular arithmetic, aka “clock arithmetic.” For our five-bit example in which we have a set of thirty-two binary strings, the appropriate modulus to choose is thirty-two (i.e., 2^5 or 100000). This is the midnight of our hypothetical clock on which we do our math. For example, we have that 11111 + 1 = 100000 ≡ 0. We say that 100000 is equivalent to 0. In this system of addition, which has the algebraic properties of a group (really, a ring – more on that later), every element has an inverse. This inverse (aka complement) under addition is, by definition, the negative. So with this system, we can pose the question: what is X such that 01110 + X is equivalent to 0? The answer can be nothing but the negative of 01110!

We can solve 01110 + X = 100000 (≡ 0) using only the binary arithmetic we already know. We see that:

100000 (≡ 0) = 11111 + 1 = 01110 + 10001 + 1 = 01110 + 10010

And also

100000 (≡ 0) = 11111 + 1 = 10010 + 01101 + 1 = 10010 + 01110

So it makes sense to say 10010 is the negative of 01110, and vice versa. That is, 10010 represents -14! Note, however, that this is true only in binary arithmetic with modulus thirty-two. With a word size larger or smaller than five bits, we’d wind up with different negative string representations.

The equations above demonstrate the well-known rule to form the representation of a negative binary integer in two’s complement: flip the bits and add one. This operation is known as “taking the two’s complement.”

Defining negative numbers this way allows us to define subtraction in terms of addition, so we can reuse our adder hardware circuitry for subtraction. (And lest you ask about multiplication, we’re all good there too, as our chosen system is a homomorphic image of the ring of integers!)

Note that the largest (in absolute terms) negative number we can represent using our five bits is -16, whereas the largest positive number is 15. This is because while every number from 0 to 15 has a corresponding negative, zero is its own negative. So there’s an extra negative number floating around which, strangely enough, is also its own negative.

Why is this system called Two’s Complement? Because adding a positive binary integer and its complement (negative) gives 2^w (w being the number of bits needed). You can think analogously of complementary angles in Euclidean geometry, whose measures sum up to ninety degrees. Note that for most computers today, is 32 or 64, corresponding to the native word size of the machine; it used to be 16 or maybe even 8, if you were coding for an Intel 8008 chip or something.

If you need to expand the width of a number whose word size is w to size w + k, you can do something called sign extension. This means left-padding the binary string with k copies of the sign bit. This is intuitive for positive numbers: 01110 = 001110 = 0001110, etc. For negative numbers, we can see this as follows. Imagine a binary string of length w that encodes a negative number and call it X. We know its two’s complement is X’ + 1, where X’ is just the string X with its bits flipped. As X + X’ = 2^w – 1 = 11..1 for all lengths w, whenever the two’s complement of X is left-padded with a zero (its sign bit), X must be left-padded with a one (its sign bit).

Another signed binary arithmetic system called Ones’ Complement is so named because in this system, a number and its complement together sum to 1…11; that is, one repeated w times. The rule to this latter system goes: flip the bits (don’t add one!) Unfortunately, this system suffers not least from having two distinct representations of the identity element, zero. I can only imagine the nightmare that is multiplication.

Just for fun, what might would a Ten’s Complement system look like? Suppose we have decimal strings of length two to work with, and we aren’t allowed to use a minus sign. By analogous logic, we could form negative k using the formula 10^2 – k. In which case, we’d represent negative fourteen as 86!

© 2014 Adam Klein's Blog Suffusion theme by Sayontan Sinha, modified by Adam :)