
Learn / Probability
Probability is low-key fun,
high-key useful,
and you use it way more than you think
Have you ever heard of probability? Have you ever caught yourself saying "What're the odds ...?", or "it's definitely likely!", or even simply "yeah, probably". These thoughts are actually you doing extremely fast calculations of likelihood and you're probably (heh) not even aware. Have you ever driven through traffic and taken a calculated risk if you can change lanes based off of a given driver's behaviour? Even second guessed saying something in conversation? These are all examples of situations involving the science of probability.
Probability theory is the branch of mathematics dedicated to explaining the ways we interpret the idea of chance and how several events can depend on one another to give us the odds of something happening. What a mouthful! Give that a re-read.
By the time we get to the end of this article, we're going to fully understand the following equation of my design:

x <= n
The Basics
Let's start smaller. Probability as a number value is between 0 and 1, where 0 means it will not happen at all, and 1 means it is guaranteed to happen. If you were to flip a coin, the odds of it landing on heads would be 0.5, or 50%; make sense? P, probability, of a coin being flipped and landing on heads is expressed like this:
P(heads) = 0.5
The same can be said with a generic 6-sided dice roll: that any number rolled has a 1/6 probability.
Optional Probability
What if I wanted the probability that the dice rolled a 1 or a 2? When talking about the idea of either event a or b transpiring, we add the probabilities.
1 or 2 would be 1/6 + 1/6, or 0.1666... + ~0.1666..., or 0.3333..., or ~33% probability.
If I wanted to know the probability of rolling a 1, 2, or a 3, it would be 1/6 + 1/6 + 1/6 (Or 1/6 multiplied by 3: 3/6 = 1/2) = 0.5 or 50%.
This makes perfect sense. We are wondering if we will roll any of half the numbers. Same logic applies if we want to know the odds of us rolling any of the 6 outcomes. That would be (1/6) * 6, or 6/6, which is 1; 100% chance of happening.
What're the odds of flipping heads or tails on the same coin? 0.5 + 0.5 = 1. Makes sense.
Required Probability
What if I want to know the odds of both of two events occurring? We multiply them.
Say you have a coin and I have a coin, what're the odds I'll flip a heads and you flip a tails? 0.5 * 0.5 = 0.25 or 25%. This makes sense, it's a quarter. There are 4 possible combinations total: H & T, T & H, T & T, and H & H. So naturally any of those outcomes occurring is 1/4 or 25%.
If you have a careful eye you might notice that if we were flipping the same coin, the odds of me getting heads and tails on the same coin should be impossible, yet the math says it's 0.5 * 0.5, what gives?
This is where we discover some new rules on probability theory; rules around mutual exclusivity and independence.
Conditional Probability
For the purposes of probability notation ∩ and ∪ looking like rubbish on text, I'm going to refer to ∪ (union) as OR and ∩ (intersection) as AND.
In probability theory, if two events are mutually exclusive, they cannot occur at the same time. If they are independent, the occurrence of one has no impact on the occurrence of the other.
If events A and B are mutually exclusive (that is that they both cannot occur at the same time), then P(A AND B) = 0; therefore using knowledge that heads and tails together are mutually exclusive, we can deduce they will never coincide and that the probability is 0%.
Mutually Exclusive
Let's talk addition first. If two events are mutually exclusive, we can simply use P(A) + P(B). For example, say we rig a traffic light up to a rng (random number generator) machine of sorts that didn't switch so predictably (that is, that we know red comes a bit after orange). We know that there are 3 colours, so each has a 1/3 chance of being turned on. However, we know that only one can be turned on at a time which means that they are mutually exclusive (two lights cannot turn on at the same time), so we can simply use P(red light OR green light) = 0.3 + 0.3 = 0.6.
Using what we now know, we know that two coins flipping at the same time are not mutually exclusive (they can occur at the same time). This makes doing any logic around OR a bit different. P(coin 1 = heads or coin 2 = tails) != (does not equal) 0.5 + 0.5 = 1. Coin 1 can be tails and coin 2 can be heads just as much; so the first scenario isn't 100% guaranteed.
This is where we use the "Addition Rule". This rule states that when not mutually exclusive, optional probabilities can be expressed with:
P(A OR B) = P(A) + P(B) - P(A AND B)
Let's break this down. From what this equation says: We want to know the probability of A occurring OR B occurring, but removing a scenario where both probabilities occurred. Why would we do that last bit? The answer is because without it, we're doubling up on our estimate. Let me explain.
P(A) + P(B) alone means P(A AND B) OR P(A AND A) OR P(B AND A) OR (P B AND B). In English, this says: "A occurring regardless of what happens to the second coin or B occurring regardless of what happens to the first coin". If these events were mutually exclusive it would be okay (because each probability wouldn't include the AND of the other). Starting to get it? Me neither.
Mathematically we then have this for our above coin-flip scenario:
P(A) + P(B) - P(A AND B) = 0.5 + 0.5 - (0.5 * 0.5) = 0.5 + 0.5 - 0.25 = 0.75 = 75%
This makes sense. Because say we failed the first flip, the second flip gives us another opportunity and thus a higher probability. Imagine I had 100 coins instead of 2. If all I had to do was get one heads (which has the same probability as tails re. the above example), it's almost certain that I'll land on heads the more I flip. To include another coin, it looks like:
P(A) + P(B) + P(C) - P(A AND B) - P(A AND C) - P(B AND C) + P(A AND B AND C)
As you can imagine this would get complicated so you're welcome to try and find what it'll look like at 100 but I sure as hell am not.
Okay so now we know how to figure out the odds of A or B occurring for events that are or aren't mutually exclusive.
Dependence
We also know A AND B for mutually exclusive events are 0, so what can we use for occurrences of A AND B? This is where we get into independence vs dependence.
If event A is independent of B, we can simply multiply their probabilities. This works great for coins because in the prior example, my second flip will in no way be impacted by my first coin flip unless my first coin was giant and it created an earthquake that shot our second coin into orbit. But what if that was the case? Well this is where we'd use the following terminology:
P(B | A)
This means what is the probability of B given A. What are the chances B will occur if A occurs?
P(B | A) = P(B AND A) / P(A)
This is just the fancy probability way of showing that we discover the odds of B happening as a result of A by finding out what their odds are happening at the same time and just removing A. Contrary to the function of equations, this isn't actually a literal calculation. If we want to figure out the odds both of two dependent events occurring, we would use the following "Multiplication Rule":
P(A) * P(B | A)
This one is a little tricky to explain so I'm going to take an example from my educator, Deborah Rumsey, PhD.
"Suppose, for example, that a class is made up of 60 percent women, and of these women, 40 percent are married. What's the chance that a person you select at random from the class is a woman and married? To answer this, let the event W = {woman} and M = {married}.
What you want is P(W and M), which is the joint probability P([set of women that are intersected with married]). You know that 60 percent of the class is made up of women, which means P(W) = 0.6. You also know that of the women in the class 40 percent, 0.40, are married.
You have to use a conditional probability to solve the problem because you split up the women and look at the probability that they're married --- P(M | W) = 0.4. By the multiplication rule, to find P([set of women that are intersected with married]), you take P(W) * P(M | W) = 0.60 * 0.40 = 0.24.
Of all the people in the class, 24 percent are women and married, which also means that the chance of you picking a married woman from the class is 24 percent."
Complements
So now we have discussed calculating the odds of two events occurring together, if one of the events occurring was independent or dependant of the another, and for events mutually exclusive or simultaneous. Before we continue to applying our base understanding of probability, let's quickly discuss complements.
"Damn, girl, I like that fit" has a probability of 1, for example.
Just kidding - about the example ;) - a complement in probability theory is a negation: 1 - X. The complement of 0.6 is 1 - 0.6 = 0.4. The complement of true is false. Complements are expressed in many ways but for the sake of this article, I'll go to my engineering roots and use an exclamation:
P(!A) = 1 - P(A)
The Birthday Problem
Complements are useful not only because they tell us the inverse information immediately, but I found that it also removes a lot of the work. My go to example of precisely how this works is in the famous birthday problem. Take a group of 25 people. Take an intuitive, off-the-top-of-your-head guess, as to what the odds are that two of them share the same birthday.
Go on.
If you're anything like me, my estimation was not high, maybe between 20-30%? It's shocking to realise that the odds are actually over 50%. There's a greater chance that two people share the same birthday in that group than you getting heads in coin flip. Let's analyse the probability to not only understand how we've discovered that humans intuition (as much as I advocate it) can be trumped by logic, but also get a glimpse into the utility of a complement.
Firstly we acknowledge that there are 365 days in a year, and every person has a birthday that guessed at randomly, would be 1/365. Inversely, we can also say the odds of you having a birthday on any other day is 364/365, which equates to 0.99726027397. That's good odds that we will guess wrong if we randomly guess your birthday.
So now you're probably being like "Ay yo, Marty. Why're we going off track of finding out the answer? Why're we looking into the complement?" Because, clever reader, you're going to have to trust me when I say it'll make things a whole lot simpler if we find the inverse probability -- that is, the odds of us picking two people out who don't share the same birthday.
So, how many groups of two people can we make of 25? The answer isn't 25 * 25, that would imply that people could be partnered with themselves. So we start by using factorials. The factorial of 25 is expressed as 25!. 25! = 25 * 24 * 23 * 22 * 21 ... This now removes the idea that people can be duplicated, but we're still not quite done. This number tells us that there are 1.551121e+25 ways to combine 25 people without using the same person twice. But we don't care about all 25 people sharing the same birthday. We only care about two. So what we do is divide the factorial by 2! (which is just 2). But even then we're not quite done yet, we haven't considered the order; that is, Tim and Mark and Mark and Tim are two options that are represented to us as one option. To our problem it doesn't matter, we only want Tim and Mark regardless of order. So what we do is we create what is known as the combination formula.

What we then do is include in the division (which culls our number further) the product of our sample minus our selection (the other groups that aren't the two selected) factorial. We do this because we are saying "Only pick two people and remove all the combinations of that person with each other person so that only one order of each remains". Don't think too much on it lol.
In this case the numbers add up to 1.551121e+25 / (2 * 2.5852017e+22) = more or less 300 (to a very small error margin):

Okay so now we know that there are 300 combinations of 25 people of which we pick two individuals, we now tie it back into our probability. We say that the odds of some one NOT sharing a birthday is (364/365) AND (multiplication) that applies to everyone (300) we are left with:
(364/365) ^ 300 = 0.43909223576
So 0.43. That is the odds two people of 25 people don't share the same birthday. How do we find the odds of them sharing the same birthday? You guessed it, complements. 1 - 0.43 = 0.57 = 57% chance that two people in a room of 25 have the same birthday. How wild is that? Craziest bit is it's actually true.
Now if you were to try and figure this out without using a complement it's like taking a stroll through hell. After tons of reading and research I finally was able to figure it out years ago but have since suppressed it. You're welcome to give it a crack and you'll totally come to appreciate the value of complements once you're done.
Complex Probability
Alright, we're ready to tackle the big one. Don't be intimidated by the sigma and power numbers, these are just efficient ways of expression multiplication and addition.
One night I was up at my mates house drunkenly watching people around the room partake in an assortment of activities. From therapy sessions to giggling Mary Janes, I was most captivated by a drinking dice game taking place in the centre. Around this point in time I was studying probability for fun, so I was easily allured to things that involved chance to see if I could put my knowledge to the test. The game taking place was the classic: Liar's Dice (as seen in Pirates of the Caribbean). The game was simple, ...
[Just going to cut myself off here. But it has come to my attention that the version we were drunkenly playing (or rather, I was drunkenly observing) is not the full function of the game; so board game nerds be warned]
... players in the room would start the game each in possession of a cup and 5 d6's (6-sided dice). You'd shake them in your cup and splat your cup down on the table to conceal the dice. You'd take a peek under your cup and look at what you had. Your goal was essentially to take a guess of how many numbers of x would be on the field when you all lifted your cups up. If your turn was 5, you'd try to guess how many 5's there were. 1's are also a wildcard in that they can be included as a 5. I was fascinated by the concurrent relational probabilities so I wanted to know, and what our probability question would become is:
Of n rolled dice, what're the odds that at least x favourable numbers were rolled
Firstly, we know that the probability of any number on a dice is 1/6. We know also that a 1 is a wildcard, so the odds of a favourable number is 2/6 or 1/3. I have encapsulated this value in the character p. For simplicity sake, we will assume we have a total of 5 dice (n), and we want to see the odds of at least 3 (x) turning up what we want. We are now about to apply everything we've learned so far to solving this question.
p = 1/3
n = 5
x = 3
So say we roll 5 dice. The odds of me getting a favourable outcome is if I get at least 3 of what I wanted. If my number was 4, I'd want 3x 4's where any of them could be 1's. That is where p ^ x comes from. This is 1/3 ^ 3 = 1/3 * 1/3 * 1/3. It means of our 5 dice - let's call them a b c d e - a, b, and c are what we want. Now to figure out the probability appropriately, we need to consider the odds of the remaining dice not being what we want. How many dice are remaining? n - x = 5 - 3 = 2 dice to pair with our 3. What're the odds of an unfavourable outcome? It's a complement as we learned. So to find the odds of us getting 3 good rolls and the remaining being bad, we find the inverse probability for 2 cases. This is where (1-p)^n-x comes from.
Translating to english, we now see that p^x (1-p)^n-x means "We roll x favourable outcomes with the remaining being unfavourable".
This is a really good start. It tells us the probability of a single roll being what our goal is to discover. But what hasn't it taken into account? Order. You see, the first 3 being desirable isn't the only good outcome. a and b could be good and the rest suck except e which is also what we want, and thus we also have 3. So what do we do? Reflect back on the birthday problem. We find how many combinations of those dice without including the same dice twice, picking only 3 winners, and removing order as a factor. This is expressed in our equation as:

Okay cool, so now we know for every combination of 5 dice, that 3 will be good. We're finished right? Nope. We're missing one key factor. And that is that even though 3 correct rolls counts as a winning over all roll, we forget that so does 4, and up to 5 does as well. We absolutely need to factor them in for the over all probability. How? We say that we want 3 (the minimum), OR 4, OR 5. Notice there is emphasis on the OR - it's addition. When we add up a variable amount of events we sum them up. The "sum of" sign is represented with a sigma (the weird looking sideways m). The sigma usually has a condition *if the conditions of the sum aren't obvious like SUM(1,2,3)), but in this case, x increments until it reaches n.
So finally, the equation reads in English as: For the minimum success requirement of x dice rolling a favourable value all the way to the maximum, for every combination of dice regardless of order as long as we get the minimum up to the maximum conditions of success.
This is how we would discover the likelihood of that event taking place.
So we've learned about combining required or optional events, independent and dependent events, mutually exclusive and simultaneous events. We're learned about complements and how to solve basic and more complex probability problems. Hopefully I've educated you on the why's and how's behind the way we passively make calculated risks all the time.
