Probability in MTG: What’s in a Win Rate? (Part 2)


In Part 1, I defined win rate as an unknown but fixed parameter of
the form:

Pr(Deck A, Pilot B wins versus Deck C, Pilot D)

Which is the probability that Deck A piloted by Player B wins versus Deck C piloted by player D. It was explained how the best estimate for this parameter can be obtained by having Player B and Player D play the matchup a large enough number of times and taking the proportion of Player B’s wins by the number of matches played as the estimate. However, the utility of such an estimate is quite narrow, as it tells only about who has the statistical advantage between two specific players playing two specific decks. In Part 2 of this 3-Part series, I discuss how to estimate or evaluate the estimates of more general win rates:

Pr(Specific Deck, Any Player versus Any Deck, Any Player in a meta)

Pr(Specific Deck, You versus Any Deck, Any Player in a meta)

These win rates are much more useful for making decisions about which deck to use. However, getting good estimates of them can be much more complicated, depending on how good one wants those estimates to be.

Case 1: Pr(Specific Deck, Any Player versus Any Deck, Any Player in a meta)

For the first case, I consider the objective of estimating the win rate of a specific deck in a meta. By “in a meta,” I mean to narrow down the matches under consideration to a specific setting. For example, the MTGA best-of-one ranked ladder can be one meta, while the MTGA play queue can be another meta. How complicated this estimation is can be demonstrated as follows. Assume that there are only two decks in the meta, say RDW and UW control, and that there are only two players, Earl and Randy, who play on the ladder. In this case, our problem simplifies to estimating the following:

Pr(RDW, Earl versus UW, Randy)

Pr(RDW, Randy versus UW, Earl)

Pr(RDW, Randy versus RDW, Earl)

Pr(UW, Randy versus UW, Earl)

Let us assume that each of these matchups were estimated by playing 100 games with outcomes as follows:

Estimate of Pr(RDW, Earl versus UW, Randy)=0.60

Estimate of Pr(RDW, Randy versus UW, Earl)=0.60

Estimate of Pr(RDW, Randy versus RDW, Earl)=0.50

Estimate of Pr(UW, Randy versus UW, Earl)=0.50

From these results, we can infer that it does not matter whether Earl or Randy is piloting RDW, the deck has a better win rate versus UW. Also, we can infer that in the mirror match, each player has a fair (50%) chance of winning.

Now, suppose both Randy and Earl use each deck on the ladder an equal number of times. Then whenever Randy is playing with RDW, he has a 50% chance to get matched versus UW and a 50% chance to get matched against UW. The same is true for Earl. Therefore in general, any time RDW goes on the ladder, it has a 50:50 chance of going up against either UW or RDW. Thus, the win rate of RDW on the ladder will be 0.60(0.50)+(0.50)(0.50)=0.55.

In reality, there are more than two decks and more than two players in the meta. This means estimation of the win rate of a deck like RDW would mean estimating each probability that the deck will win against each of the other decks (including RDW itself) across each possible combination of players. The only way that this can be estimated is by gaining access to MTGA’s server data.  Suppose MTGA publishes such data, that data would be the best possible source of estimates for each of the decks’ win rates. However, the assumption that these win rates are unknown but fixed parameters would no longer hold, as these win rates are likely to change depending on how the player base reacts to the state of the meta. Going back to our toy example, if both Randy and Earl are made aware that RDW has a higher win rate than UW, both may decide to use RDW exclusively instead of using RDW and UW an equal number of times. Obviously, this would mean that RDW’s win rate would drop from 55% to 50%.

At the moment, MTGA does not directly release the data needed to estimate win rates. However, each player’s matches are recorded in their log files, which are then aggregated by different tracker sites in order to produce win rate estimates. One such site is MTGA PRO which is shown below. How good these estimates are depend on how well the data represents the entirety of the data in the MTGA servers. Each tracker is limited by the data coming from their users and the opponents that their users face. Thus, while probably not unbiased, each tracker’s dataset is a good reference to what is actually happening in the meta. This is especially true when different trackers provide similar estimates. 

Now, a lot of factors come into play when comparing win rates like this that one should keep in mind before making any decisions. Consider the snapshot posted from the MTGA PRO. Suppose you are thinking of which deck to spend wildcards on and try on the ladder. At first glance, it seems that Simic Control has the highest win rate, followed by UW Control and then Gruul Aggro, and there are large enough sample sizes for these estimates to be quite stable when constructing a confidence interval. However, the assumption that these are identically and independently generated data no longer holds. The decks are played by different people with different levels of skill, and each player plays a different number of times. Thus, it may be the case that RDW’s slightly lower win rate is because a lot more people use the deck, and so out of those people, half may be getting 40% win rates while the other half is getting 64.48% win rates, which averages out to the published number. Still, this does not mean that the numbers are useless; rather the picture it paints is blurry and we should be aware of that when making decisions based on the picture. 


As shown in another snapshot from MTGA PRO above, I can still confidently say that decks like UR Aggro, Orzchov Aggro, Grixis Control etc., are not decks I want to spend wildcards on and try out. These decks are Tier Bad. I would go so far as to say that the data is showing that BG Adventure is not performing well right now at a 46.72% win rate. However, in choosing among RDW, UW, Simic, or even Gruul, I would consider much more than just the published numbers. I still know that these decks are good based on the data, but now I can consider things like which among them is cheapest to craft? Which is likely easiest to learn? Which is my intuition telling me, based on my experience during other seasons, is the best choice? As such, these estimates can be used to narrow down one’s choices and eliminate options that while our mind may be telling us are potentially strong candidates, are just not putting up strong enough numbers *cough* Mono Black Devo *cough*.


Case 2: Pr(Specific Deck, You versus Any Deck, Any Player in a meta)

Having discussed how to evaluate win rate information that one has access to online, I next discuss how to estimate one’s own win rate using specific decks selected based on such prior information. Obviously, the best way to estimate one’s own win rate with Deck A is to play Deck A in the meta of interest. This is why testing Deck A on the play queue when one is interested in using Deck A on the ranked ladder has little value. The meta in the two settings (both quality of decks and competence of players) are different enough that results in one would not imply similar results in the other. Also, in my experience, the constructed event queues are much easier than the ranked queues. My tracker shows that I only have a 57% win rate on the ladder versus a 68% win rate in constructed events. However, it is understandable for one to not want to test directly on the ladder especially with a new deck and even more so when one’s rank is already close to the next level.

This brings me to the next issue of estimating one’s win rate with a deck, which is accounting for one’s skill as a confounder. If you were the perfect magic player, then you would not make any play mistakes. Thus, the win rate of the deck that you will estimate is purely the win rate of that deck when it is played perfectly. However, this is not the case for most players. I make misplays all the time, and I am more likely to make misplays when I am new with a deck. Thus, when testing a new deck, it is ideal to not count matches until after you feel confident enough that you are piloting the deck competently. This burn in period gives you some leeway to learn the ropes of the new deck before starting to estimate what its actual win rate is. Of course, determining whether or not you have become competent enough to pilot the deck with minimal chance of error is itself a tricky affair.

Finally, suppose you can assume that you have become familiar enough with the deck not to make obvious mistakes, then all that is left is to take the deck for a spin on the ladder in order to start estimating its win rate. In this regard, how many times you should test depends on a balance between scientific rigor and your own practical time limitations. Ideally, we should test each deck about 100 times in order to get a useful confidence interval. Testing this many times also allows us to encounter a sufficiently large representative of the meta. That is, it ensures we get to have matchups against the decks that are common on the ladder. However, the time investment needed to test this way for each deck is too much. Instead, I make use of an adaptive testing system, which is just a fancy way of saying I drop a deck as soon as it is clear to me that its win rate is not stellar. For example, after testing 10 matches, suppose I only win 4 or less out of 10, with none of them from obvious misplays on my part, then I take that as sufficient evidence that the deck is Tier bad. Sometimes, how bad a deck is becomes apparent after losing 3 matches straight with it. You get a feel on how the deck is too slow, or how the interaction it is supposed to capitalize on is not powerful enough. While you do not get to estimate the win rate of such decks reasonably well at all using this method, it is enough to know that the win rate is not in the range that you want for you to not want to waste any more time with the deck. In contrast to this, if a deck keeps winning while you’re testing it, then that is obviously a good sign to keep testing it! It is possible that you just got lucky, but testing more will further establish what the deck’s true win rate is. If it does keep winning, then you may have something good. On the other hand, if the win rate eventually converges to something close to 50% or worse, then drop the deck and move on. Once you have found something that has a good win rate, then you can proceed with practicing more with the deck. You already know that the deck has a good win rate, now your goal is to hone your play so that you won’t make any errors, thus helping ensure that the deck is able to achieve its true win rate.

 May the shuffler be with you.


Comments