Probability in MTG: The Hypergeometric Distribution



In the previous entry, I talked about the basic concept of probability and gave an example of how it can be useful in a game of Magic. In this entry, we take this one step further, by revisiting the concept of probability distribution of a random variable and introducing the hypergeometric distribution which is very useful for answering questions such as:

1.)    What is the probability that a given deck will mulligan?

2.)    What is the probability that a certain card will be in a player’s opening hand on a mull to 6?

3.)    What is the probability that a combo deck will have all of its pieces by a certain turn?

In relation to this, I will provide you with a tool that is easily accessible using Microsoft Excel for you to play around with.
Recall that a random variable can be thought of as an activity that produces some outcomes, each with a specific probability. A formula or table from which the probability of each outcomes can be computed or seen from represents the probability distribution of the random variables.
For example, the outcome of the random variable X of Deck A going against Deck B can have the following probability distribution table:

Outcome
Probability
Deck A wins
0.7
Deck A loses
0.3
As a formula, the random variable X can be represented by P(X=x)=0.7x0.31-x where X is either 0 (Deck A loses) or 1 (Deck A wins).

This type of random variable is called a Bernoulli Random Variable.
However, different random variables have their own respective probability distributions. In particular, the random variable at work in some of the questions posed above, where a given number of cards is drawn from a finite deck and the interest is to know the probability of a certain number of cards appearing in the opening hand, has what is called a hypergeometric distribution.

Let us say that Y is the random variable of drawing exactly Y number of lands from a deck with 24 lands. A hypergeometric distribution requires knowledge of the following parameters:

1.)    The total number of cards in the deck

2.)    The size of the hand drawn

3.)    The number of the desired cards (lands) that are in the deck
Each of these are known in our example. The deck has 60 cards, a hand of 7 cards is drawn, and there are 24 lands in the deck. Given these parameters, the probability that Y=0. That is, that the opening hand will contain no lands, is computed as P(Y=0)= 0.0216 or about 2%. How about the probability that Y=1? That is P(Y=1)=0.1210 or 12%. To make it simple, let us say that you are playing a deck where you only typically mulligan hands that have 1 to no lands. This means that the probability that you will mulligan with the deck is P(Y=0) + P(Y=1) = 0.1426. How did I do those computations? The details about the hypergeometric distribution including its formula can be found in this Wikipedia article. However, you can easily use the function =HYPGEOMDIST in Excel to get these values. For P(Y=0), the input is =HYPGEOMDIST(0,7,24,60). Try if you can get the value for P(Y=1).

Now let us try the second question. What is the probability that at least one copy of a certain card will be in a player’s opening hand on a mull to 6? Let us assume that he has 4 copies of this card in the deck. So, the answer is 0.35146 or about 35%! Did you get it? Let me know in the comments.

How about the third question? Unfortunately, the random variable involved in the third question does not follow a simple hypergeometric distribution. Its distribution has some of the qualities of a hypergeometric distribution and can be considered as a generalized form of hypergeometric, but the Excel function that we used for the other two questions will not work. Luckily, we do not really need to know what the distribution is if all we want to do is to answer the question of how likely it is for the combo pieces, unimpeded by other spells, to be complete by say, Turn 3. All we need to do is to simulate the process enough number of times. That is, goldfish the deck over and over again and see how many of those result in the desired outcome. This works because theoretically, the distribution of those repeated trials will converge to the actual distribution of the random variable of interest. How many repetitions do you need? It depends. A few hundred may do if you notice that the probability no longer changes drastically with every few repetitions. It may be tedious, but it will get the job done. This is important when evaluating the viability of a new combo deck. If it turns out that you cannot reliably pull the combo off consistently by a certain turn just goldfishing, then it may not be a good idea to take it to your next FNM. Using a random shuffler may expedite the process. It is also possible to program the entire algorithm. However, if you want to do this on paper, I suggest doing it by hand since the way you shuffle may affect the outcomes (it shouldn’t if you are shuffling well enough, but that’s an entry for another day).

May the shuffler be with you!

Comments