Generalized rock-paper-scissors, best-of-3
Statement
A and B are playing a game of generalized Rock-Paper-Scissors, with \(n = 2k+1\) distinct moves. Which moves beat which others is given as a \(n\) oriented complete graph, where \(d_{in}(v_i) = d_{out}(v_i) = k\).
The game is played best-of-3. That is, A and B play until one of them has two victories. You are given the probability vector \(V_A\) that A plays each of his moves, and the probability vector \(V_B\) that B plays each of his moves. You should determine the probability that A wins.
For instance, suppose \(k = 2\), and you’re given the vectors \(V_A = \langle 0.1, 0.1, 0.2, 0.3, 0.4, 0\rangle\), \(V_B = \langle 0.1, 0.1, 0.1, 0.1, 0.6 \rangle\), you should answer \(\frac{2}{3}\).
Analysis
It should be clear that given \(V_A\) and \(V_B\), we can know the probability that A wins a given round. This is simple enough. Suppose that the game is as follows:
Then we know the probability \(w\) that A wins a given round, is (denoting \(X_i\) the probability that player X plays move \(i\))
\[ \begin{aligned} w = & A_{Spock} \cdot (B_{Scissors} + B_{Rock}) \\ &+ A_{Lizard} \cdot (B_{Spock} + B_{Paper}) \\ &+ A_{Rock} \cdot (B_{Scissors} + B_{Lizard}) \\ &+ A_{Paper} \cdot (B_{Spock} + B_{Rock}) \\ &+ A_{Scissors} \cdot (B_{Lizard} + B_{Paper}) \end{aligned} \]
Similarly, the probability \(t\) of a tie can be written
\[ t = \sum_{i} A_i \cdot B_i = \langle V_A, V_B \rangle \]
And obviously, we have that the probability \(l\) of A losing can be written \[ l = 1 - w - t \]
So we want to express the probability that A wins two games before B does. After unsuccessfully attacking the problem as a single sum of probabilities, I decided to think of it as four different subgames. In the first subgame, let’s call it \(G\), neither player has won any rounds yet. In subgame \(G_1\), A has won exactly one round, B hasn’t won. In subgame \(G_2\), B has won exactly one round, and A hasn’t won. Finally, in subgame \(G_3\), both players have won exactly one round each.
If we think of the game this way, we see that we have a sort of Markov chain of subgames, that looks like this:
Where \(0\) is the state where A lost the game, and \(1\) is the state where A won the game. As an example, this means that if we are playing subgame \(G_2\), with probability \(w\) we will start playing subgame \(G_3\), with probability \(l\) A will lose the game, and with probability \(l\) we will keep playing subgame \(G_2\).
What we want, then, is the probability that, starting at subgame \(G\), we get to the state \(1\). That is, we get to the state where A has won.
What we could do is reason backwards, and ask for each node, what the probability is to reach it, starting from \(G\). If we do that, then we need only ask this question about the node \(1\) to obtain our answer. So let’s try.
- For \(G_1\), remember that it is the state where A won one round, B won none. The probability that we got to that subgame is the probability that A’s victory was on the first round, or that they tied the first round and A won the second round, or that they tied twice and A won the 3rd round, or so on. The probability of the first case is what we called \(w\). The probability of the second case is \(t \cdot w\), since each round is independent. The probability of the third case is \(t \cdot t \cdot w = t^2 w\). Since all of these things are disjoint (we can’t both have tied until the \(k\)th round and won the \(k+1\)th round, and tied until the \(k'\)th round and won the \(k'+1\)th round, if \(k \ne k'\)), we must sum their probabilities. This means that the probability that we get to \(G_1\) is
\[ \begin{aligned} P(G_1) &= \sum_{i=0}^\infty t^i w\\ &= w \cdot \left(\sum_{i=0}^\infty t^i\right)\\ &= \frac{w}{1 - t} \end{aligned} \]
For \(G_2\), we can do the exact same analysis, arriving at \[ P(G_2) = \frac{l}{1 - t} \]
For \(G_3\), since we either came from \(G_2\) or came from \(G_1\), and in each of those we could have tied an arbitrarily large (but obviously finite) number of times. This means we have \[ \begin{aligned} P(G_3) &= P(G_1) \cdot \left(\sum_{i=0}^\infty t^i\right) \cdot l + P(G_2) \cdot \left(\sum_{i=0}^\infty t^i\right) \cdot w\\ &= \frac{P(G_1) \cdot l + P(G_2) \cdot w}{1 - t}\\ &= \frac{\frac{w}{1 - t} \cdot l + \frac{l}{1 - t} \cdot w}{1 - t}\\ &= \frac{2\cdot w \cdot l}{(1-t)^2} \end{aligned} \]
Finally, for state 1, it must have happened that either they played game \(G_1\) and A won after some number of ties, or they played game \(G_3\), and A won after some number of ties. That is, we must have \[ \begin{aligned} P(1) &= \frac{P(G_1) \cdot w + P(G_3) \cdot w}{1 - t}\\ &= (P(G_1) + P(G_3)) \cdot \frac{w}{1 - t}\\ &= \left(P(G_1) + \frac{2\cdot w\cdot l}{(1 - t)^2}\right) \cdot P(G_1)\\ &= P(G_1)^2 + 2 \cdot P(G_1)^2 \cdot P(G_2)\\ &= P(G_1)^2 (1 + 2 \cdot P(G_2)) \end{aligned} \]
Generalization
That was the original problem I was faced with, and that solution was indeed correct. However, we can certainly generalize this. We can consider that the winning player is the one who gets to \(k\) rounds won first. In order to consider this, we will need to expand our model to the following generalization:
Let \(n, m \in \mathbb{N}^2\). We define the subgame \(G_{n, m}\) to be the one where A has won exactly \(n\) rounds, and B has won exactly \(m\) rounds. Then, the Markov chain around \(G_{n, m}\) looks like this:
From which we can deduce, by an analysis similar to our previous ones, that \[ P(G_{n, m}) = \begin{cases} 1& \text{if } n = 0 \text{ and } m = 0\\ l \cdot P(G_{n, m-1}) \cdot (1 - t)^{-1}& \text{if } n = 0 \text{ and } m > 0 \\ w \cdot P(G_{n-1, m}) \cdot (1 - t)^{-1} & \text{ if } n > 0 \text{ and } m = 0 \\ (w \cdot P(G_{n - 1, m}) + l \cdot P(G_{n, m-1})) \cdot (1 - t)^{-1} & \text{ otherwise } \end{cases} \]
With the initial game being, of course, \(G_{0, 0}\).
To find the probability that A wins (i.e. is the first to reach \(k\) rounds won), we must find \(P(G_{k-1, i})\) for all \(1 \le i \le k-1\), and compute \[ \sum_{i=0}^{k-1} P(G_{k-1, i}) \cdot w \]
This yields the obvious dynamic programming algorithm to compute the required probability, in \(O(k^2)\) space and time.
Alternative solution
Another way of finding the desired probability, is considering that ties do not matter. That is, if I am asking about probability of A reaching \(k\) rounds won before B does, it doesn’t really matter what the probability of tying wins. All that matters is who is more likely to reach \(k\) wins, and by how much. In a sense, the problem only depends on \(w\), not on \(t\). So we can rescale \(w\) and \(l\) by multiplying them by \((1-t)^{-1}\), and now we have the constraint \(w + l = 1\).
Thus, we can consider every sequence of the form \(WLLWWWLLWLLLWLL\dots\), that is, sequence of rounds being won or lost, ignoring ties. We can say that any victory for A, is exactly one sequence of \(k-1\) wins for A and \(i\) wins for B for some \(1 \le i \le k-1\), followed by a win by A. So to compute this, we can sum \[ \begin{aligned} &\left(\sum_{i=0}^{k-1} \binom{k + i - 1}{k - 1} \cdot w^{k - 1} \cdot l^i\right) \cdot w\\ = &w^k \cdot \sum_{i=0}^{k-1} \binom{k + i - 1}{k - 1} \cdot l^i \end{aligned} \]
Where the sum has a limit of \(k-1\), since we do not allow for B to have won \(k\) times, only at most \(k-1\), before A’s final winning of the \(k\)th round.