…I’m going to calculate the probability that one of my readers from Berkeley has also finished their finals.
Problem statement: Consider final time slots. of the time slots have passed. What is the probability that a student in Berkeley has finished all his/her finals given that he/she has finals?
Assumptions: There are about an equivalent amount of people taking finals in each time slot. There is no systematic “bias” in what final time slot one any student has.
Disclaimer: I’m actually really bad at probability, so correct me for errors in this computation.
Solution: This is a pretty straightforward computation, and is mathematically equivalent to the Polya Urn scenario of sampling without replacement.
The idea is that we’re going to pick a final time slot by random and find the probability that it’s within the first time slots (let’s call this a “success”). Your probability of selecting any two different time slots is independent, and each time slot has equal probability of being selected (remember our assumption that the same amount of people are taking finals in each slot). Hence the probability that our first pick is within the first time slots (a “success”) is .
For our second pick, recall that we cannot have conflicting finals. So we are choosing out of a new sample of time slots. The space of outcomes where all finals are within the first time slots can only include outcomes where the first pick is a success; therefore, we condition this second pick on it. In other words, this probability is that of having this fall into out of time slots = .
If you keep doing this with 3rd, 4th, … th picks (you pick times because that’s how many finals you’re taking), you get the following expression:
where is a random variable representing the number of finals that fall into the first time slots, given a sampling (total number of finals) of size , and , the ith factorial moment of , is defined as
Now we can expand this to include a non-homogeneous population of students who do not, in general, have the same amount of final exams. Let be the proportional of the student population who have exams, and runs from 0 to (if it is evident that the probability is zero). The probability that any student you meet on campus will have finished their finals exams is:
Let’s then calculate the probability that someone is done tonight, using real numbers. There are =20 time slots, 4 on each day; so after tonight =16 out of 20 final slots are completed. Let’s just assume for this discussion that 1/8 people have one final (), 1/4 have two (), 1/4 have three (), 1/4 have four (), 1/8 have five (), and 0/8 have any other number of final exams. The expression above then evaluates to
In other words, according to this, only 50% of UC Berkeley students are done with finals right now.
Wait, you ask? How is this possible?! All the friends I know are done with finals! I’ve had my last final earlier than Friday almost every semester!
What this suggests is one of two things.
(1) The University does not assign time slots evenly such that an equal amount of people take finals in each time slot. They could bias it so more people take finals earlier, or
(2) More likely, students select in a biased way as to favor earlier finals.
The random fluctuations inferred by point (1) are almost impossible to quantify without more data. But for (2), we can actually calculate how bias in sampling can alter our probability. More on this in a later blog post.