Generalized rock-paper-scissors, best-of-3

Statement

A and B are playing a game of generalized Rock-Paper-Scissors, with \(n = 2k+1\) distinct moves. Which moves beat which others is given as a \(n\) oriented complete graph, where \(d_{in}(v_i) = d_{out}(v_i) = k\).

The game is played best-of-3. That is, A and B play until one of them has two victories. You are given the probability vector \(V_A\) that A plays each of his moves, and the probability vector \(V_B\) that B plays each of his moves. You should determine the probability that A wins.

For instance, suppose \(k = 2\), and you’re given the vectors \(V_A = \langle 0.1, 0.1, 0.2, 0.3, 0.4, 0\rangle\), \(V_B = \langle 0.1, 0.1, 0.1, 0.1, 0.6 \rangle\), you should answer \(\frac{2}{3}\).

Analysis

It should be clear that given \(V_A\) and \(V_B\), we can know the probability that A wins a given round. This is simple enough. Suppose that the game is as follows:

Then we know the probability \(w\) that A wins a given round, is (denoting \(X_i\) the probability that player X plays move \(i\))

\[ \begin{aligned} w = & A_{Spock} \cdot (B_{Scissors} + B_{Rock}) \\ &+ A_{Lizard} \cdot (B_{Spock} + B_{Paper}) \\ &+ A_{Rock} \cdot (B_{Scissors} + B_{Lizard}) \\ &+ A_{Paper} \cdot (B_{Spock} + B_{Rock}) \\ &+ A_{Scissors} \cdot (B_{Lizard} + B_{Paper}) \end{aligned} \]

Similarly, the probability \(t\) of a tie can be written

\[ t = \sum_{i} A_i \cdot B_i = \langle V_A, V_B \rangle \]

And obviously, we have that the probability \(l\) of A losing can be written \[ l = 1 - w - t \]

So we want to express the probability that A wins two games before B does. After unsuccessfully attacking the problem as a single sum of probabilities, I decided to think of it as four different subgames. In the first subgame, let’s call it \(G\), neither player has won any rounds yet. In subgame \(G_1\), A has won exactly one round, B hasn’t won. In subgame \(G_2\), B has won exactly one round, and A hasn’t won. Finally, in subgame \(G_3\), both players have won exactly one round each.

If we think of the game this way, we see that we have a sort of Markov chain of subgames, that looks like this:

Where \(0\) is the state where A lost the game, and \(1\) is the state where A won the game. As an example, this means that if we are playing subgame \(G_2\), with probability \(w\) we will start playing subgame \(G_3\), with probability \(l\) A will lose the game, and with probability \(l\) we will keep playing subgame \(G_2\).

What we want, then, is the probability that, starting at subgame \(G\), we get to the state \(1\). That is, we get to the state where A has won.

What we could do is reason backwards, and ask for each node, what the probability is to reach it, starting from \(G\). If we do that, then we need only ask this question about the node \(1\) to obtain our answer. So let’s try.

\[ \begin{aligned} P(G_1) &= \sum_{i=0}^\infty t^i w\\ &= w \cdot \left(\sum_{i=0}^\infty t^i\right)\\ &= \frac{w}{1 - t} \end{aligned} \]

Generalization

That was the original problem I was faced with, and that solution was indeed correct. However, we can certainly generalize this. We can consider that the winning player is the one who gets to \(k\) rounds won first. In order to consider this, we will need to expand our model to the following generalization:

Let \(n, m \in \mathbb{N}^2\). We define the subgame \(G_{n, m}\) to be the one where A has won exactly \(n\) rounds, and B has won exactly \(m\) rounds. Then, the Markov chain around \(G_{n, m}\) looks like this:

From which we can deduce, by an analysis similar to our previous ones, that \[ P(G_{n, m}) = \begin{cases} 1& \text{if } n = 0 \text{ and } m = 0\\ l \cdot P(G_{n, m-1}) \cdot (1 - t)^{-1}& \text{if } n = 0 \text{ and } m > 0 \\ w \cdot P(G_{n-1, m}) \cdot (1 - t)^{-1} & \text{ if } n > 0 \text{ and } m = 0 \\ (w \cdot P(G_{n - 1, m}) + l \cdot P(G_{n, m-1})) \cdot (1 - t)^{-1} & \text{ otherwise } \end{cases} \]

With the initial game being, of course, \(G_{0, 0}\).

To find the probability that A wins (i.e. is the first to reach \(k\) rounds won), we must find \(P(G_{k-1, i})\) for all \(1 \le i \le k-1\), and compute \[ \sum_{i=0}^{k-1} P(G_{k-1, i}) \cdot w \]

This yields the obvious dynamic programming algorithm to compute the required probability, in \(O(k^2)\) space and time.

Alternative solution

Another way of finding the desired probability, is considering that ties do not matter. That is, if I am asking about probability of A reaching \(k\) rounds won before B does, it doesn’t really matter what the probability of tying wins. All that matters is who is more likely to reach \(k\) wins, and by how much. In a sense, the problem only depends on \(w\), not on \(t\). So we can rescale \(w\) and \(l\) by multiplying them by \((1-t)^{-1}\), and now we have the constraint \(w + l = 1\).

Thus, we can consider every sequence of the form \(WLLWWWLLWLLLWLL\dots\), that is, sequence of rounds being won or lost, ignoring ties. We can say that any victory for A, is exactly one sequence of \(k-1\) wins for A and \(i\) wins for B for some \(1 \le i \le k-1\), followed by a win by A. So to compute this, we can sum \[ \begin{aligned} &\left(\sum_{i=0}^{k-1} \binom{k + i - 1}{k - 1} \cdot w^{k - 1} \cdot l^i\right) \cdot w\\ = &w^k \cdot \sum_{i=0}^{k-1} \binom{k + i - 1}{k - 1} \cdot l^i \end{aligned} \]

Where the sum has a limit of \(k-1\), since we do not allow for B to have won \(k\) times, only at most \(k-1\), before A’s final winning of the \(k\)th round.