A new AI called “DeepNash” has mastered Stratego, one of the few iconic boardgames where computers don’t regularly trounce human players, according to a paper published this week. It’s a huge and surprising result—at least to the Stratego community.
Stratego has two unique challenges. It requires long-term strategic thinking (like chess), and requires players to deal in the face of incomplete information (like poker). The goal is to move across the board and capture the other player’s flag piece. Each game is played on a 10×10 grid board that has two 2×2 square lakes in the middle. Both players have 40 pieces with different tactical values that can are deployed at the start of the game—the catch is that you can’t see what your opponent’s pieces are and they can’t see what yours are. When you are planning an attack, you don’t know if the defender is a high-ranked Marshal that will beat almost all your pieces or a lowly Sergeant that can be taken out by a Lieutenant or Captain. The other pieces that you can play include bombs (that are immobile but powerful), scouts (that move more than one square at a given time) and miners (which can defuse explosive bombs). These all add to the tactical complexity. The game only ends when one player’s flag piece is captured or they can no longer make any legal moves.
This is all to say that Stratego presents a unique challenge to computers. Chess is relatively easy because all the information is visible to everyone—in game theory, it’s called a “perfect information game”. Computers can see your defenses and simulate 10 to 20 moves ahead for several options. The computer then chooses the best option. This gives them an advantage over human players. Chess is also a game that can be won or lost in just a few moments, rather than under pressure. Stratego is faster at 380 moves and chess takes about 40 moves. This means that each move in chess has a greater importance (and is worth more attention for humans), while Stratego is faster and more flexible.
[Related: Meta’s new AI can use deceit to conquer a board game world]
Stratego, on the other hand, is an “imperfect information game.” Until an opponent’s piece attacks or is attacked, you have no way of knowing what it is. Poker is an imperfect information game that computers can play at a high level for many years. There are 10164 possible game outcomes and each player has only 103 possible two card starting hands. In Stratego, there are 10^535 possible states and more than 10^66 possible deployments—that means there’s a lot more unknown information to account for. And that’s on top of the strategic challenges.
Combining these two challenges makes Stratego particularly difficult for AI researchers and computers. According to the team, it’s “not possible to use state-of-the-art model-based perfect information planning techniques nor state-of-the-art imperfect information search techniques that break down the game into independent situations.” The computer has to be able to make strategic plans that incorporate the imperfect information it has available to it.
DeepNash managed to achieve this feat. Researchers developed a unique method that allowed DeepNash to play Stratego without any assistance. It used a model-reinforcement learning algorithm called Regularized Nash Dynamics (R-NaD) combined with a deep neural network architecture that seeks a Nash equilibrium—“an unexploitable strategy in zero-sum two-player games” like Stratego—and by doing so, it could learn the “qualitative behavior that one could expect a top player to master.” This is an approach that has been used before in simple Prisoners Dilemma-style games, but never with a game as complex as this.
DeepNash was tested against both the most skilled Stratego bots as well as expert human players. It defeated all other bots, and was very competitive against expert human players on Gravon, an online board game platform. It was also able to play well qualitatively. It could trade-off between concealing its identity and taking material, make bluffs and even gamble. (Though the researchers also consider that terms like “deception” and “bluff” might well refer to mental states that DeepNash is incapable of having.)
All told, it’s an exciting demonstration of a new way of training AI models to play games (and maybe perform other similar tasks in the future)—and it doesn’t rely on computationally heavy deep search strategies which have previously been used to play other games like chess, Go, and poker.