The difficulty of a perfect March Madness bracket, Part 2: what if people became smart?

In our previous post about March Madness, we assumed that people randomly selected a bracket by chance; we did not take into account that they might remember what they selected previously and adjust their predictions. Let’s say now people become smart and don’t make the same mistake twice. How does this affect our calculations?

For convenience, let


where M is the number of combinations of brackets that can be produced and the probability p is the same as we defined in part 1. As we know only one bracket will work this is a valid transformation.

We follow the same general philosophy as before, but with a few tweaks. The problem is the same: N people all select brackets on periodic time intervals. The probability that no perfect bracket are selected during the kth time interval now depends on k, because the pool of possible brackets decreases with time from M, M-1, M-2, \cdots, and so on. So the probability that none of the N people get the right bracket in the kth interval is


(recalling that the +1 comes from the first interval’s probability of a correct bracket being 1/M). And likewise, the probability that someone gets it right is

1 - \left(1-\dfrac{1}{M-k+1}\right)^{N}.

Accordingly, we can use the same idea as in part 1 to calculate the probability that the kth interval is when someone finds a bracket: for the first k-1 intervals, nobody finds it, until somebody finds it in k. Note that in the following equation we have replaced the k from above with the dummy variable i, and reserve k for our target interval.

 P(X=k) = \underbrace{\left(\prod\limits_{i=1}^{k-1}\left(1-\dfrac{1}{M-i+1}\right)^{N}\right)}_{\text{bracket won't be right before interval } k-1}*\underbrace{\left(1-\left(1-\dfrac{1}{M-k+1}\right)^{N}\right)}_{\text{bracket is right at interval }k}

However, if we expand our product, we find that

\begin{align*} \prod\limits_{i=1}^{k-1}\left(1-\dfrac{1}{M-i+1}\right)^{N} &= \prod\limits_{i=1}^{k-1}\left(\dfrac{M-i}{M-i+1}\right)^{N} \\  &= \left(\dfrac{\cancel{M-1}}{M}\dfrac{\cancel{M-2}}{\cancel{M-1}}\cdots\dfrac{M-k+1}{\cancel{M-k+2}}\right)^{N} \\ &= \left(\dfrac{M-k+1}{M}\right)^{N} \end{align*}

and, after some additional algebraic simplifications, we end up with

\begin{align*} P(X=k) &=\left(\dfrac{M-k+1}{M}\right)^{N}\left(1-\left(\dfrac{M-k}{M-k+1}\right)^{N}\right) \\[0.5em]  &=\left(\dfrac{M-k+1}{M}\right)^{N} -\left(\dfrac{M-k}{M}\right)^{N}. \end{align*}

As has been before, the expectation value can be found as follows:

\[ E[X] = \sum\limits_{k=1}^{M}kP(X=k)=\sum\limits_{k=1}^{M}k\left[\left(\dfrac{M-k+1}{M}\right)^{N} -\left(\dfrac{M-k}{M}\right)^{N}\right]. \]

(Note that the sum only goes up to M as k>M makes no sense due to pigeonholing.)

Now this sum is difficult to calculate. So let’s split it up (we’ll also split up the fractions for reasons that will become evident shortly):

\[ E[X] = \sum\limits_{k=1}^{M}k\left(1-\dfrac{k-1}{M}\right)^{N} - \sum\limits_{k=1}^{M}k\left(1-\dfrac{k}{M}\right)^{N}. \]

Still a very formidable sum. We’ll rely on an old trick, shifting the index of summation. Let’s shift k \rightarrow k-1 in our second sum.

\begin{align*} \sum\limits_{k=1}^{M}k\left(1-\dfrac{k}{M}\right)^{N} &= \sum\limits_{k=2}^{M+1}(k-1)\left(1-\dfrac{k-1}{M}\right)^{N} \\ &= \sum\limits_{k=2}^{M+1}k\left(1-\dfrac{k-1}{M}\right)^{N} - \sum\limits_{k=2}^{M+1}\left(1-\dfrac{k-1}{M}\right)^{N}. \end{align*}

The whole thing now, so we can see what’s going on:

\[ E[X] = \sum\limits_{k=1}^{M}k\left(1-\dfrac{k-1}{M}\right)^{N} -\sum\limits_{k=2}^{M+1}k\left(1-\dfrac{k-1}{M}\right)^{N} + \sum\limits_{k=2}^{M+1}\left(1-\dfrac{k-1}{M}\right)^{N}. \]

Check out the first and second summations. They annihilate each other except for the first term of the first sum and the last term of the second sum. Evaluating those specific terms, we get

\begin{align*} \sum\limits_{k=1}^{M}&k\left(1-\dfrac{k-1}{M}\right)^{N} -\sum\limits_{k=2}^{M+1}k\left(1-\dfrac{k-1}{M}\right)^{N} \\[0.5em] &= \cancelto{1}{\left(\cancelto{1}{k}\left(1-\cancelto{0}{\dfrac{k-1}{M}}\right)\bigg|_{k=1}\right)^{N}} - \cancelto{0}{\left(k\left(1-\cancelto{1}{\dfrac{k-1}{M}}\right)\bigg|_{k=M+1}\right)^{N}}. \end{align*}

And this is very convenient — this first two summations just yield 1! Hence,

\[ E[X] = 1 + \sum\limits_{k=2}^{M+1}\left(1-\dfrac{k-1}{M}\right)^{N} = 1+\underbrace{\sum\limits_{k=1}^{M}\left(1-\dfrac{k}{M}\right)^{N}}_{\text{define this as }S}. \]

where we have shifted back k-1 \rightarrow k. Now we focus on the remaining sum S.

Up until this point everything has been exact. Now we’ll have to start making approximations. First, since M \gg 1 (recall how many combinations of brackets we’re dealing with here), we can approximate the sum with an integral:

\[ S = \sum\limits_{k=1}^{M}\left(1-\dfrac{k}{M}\right)^{N} \rightarrow \int_{0}^{M}\left(1 - \dfrac{k}{M}\right)^{N} dk \]

Evaluating this integral is fairly trivial, and certainly so compared to what we’ve doing with our beastly sum. Applying the substitution

\[ u = 1 - \dfrac{k}{M} \]

we have

\begin{align*}  S &\simeq \int_{0}^{1} M u^{N}  du \\ &= \dfrac{M}{N+1} u^{N+1}\bigg|_{0}^{1} = \dfrac{M}{N+1} \end{align*}


\[  E[X] = 1 + S \simeq \boxed{1 + \dfrac{M}{N+1}}.  \]


A simple check: if N=1 (only one person does it), E[X]=1+M/2 \approx M/2 (recall M \gg 1), which is exactly what we would expect — we would expect to hit the perfect bracket about halfway through our tedious work.

One final note: how does this compare to our answer is Part 1? Well, if we consider that the population of the Earth N \gg 1, and substitute back p for M, we get back E[X] \simeq 1/Np!

Of course, the answer isn’t exactly the same because we made a bunch of approximations in this calculations, but here’s the upshot: whether one keeps track of the brackets he has already picked does not make an appreciable difference in the expectation of when the correct bracket is chosen.

If you think about it, this shouldn’t be surprising; the probability that a certain bracket will be accidentally repeated if we choose from random, is extremely small, given we have over 1 quintillion brackets to choose from!

Which calculation would you rather do, however? Needless to say, this calculation was much more involved to get approximately the same answer. This is why we started with the assumption in Part 1 that each bracket was chosen “with  replacement”, so to speak — and it turns out that got a good enough answer anyway.

(Edited at 3:41 PM 03/21: the limits of integration were wrong in our sum–> integral conversion. This fix actually made the math a little easier!)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s