Google's DeepMind announced on Thursday that AlphaZero — the latest version of the famous AlphaGo AI that beat Go world champion Lee Sedol — has also mastered chess and shogi (Japanese chess) to superhuman levels.
The AlphaZero was introduced by DepMind in late 2017. It is a single system that taught itself from scratch how to master the games of chess, shogi (Japanese chess), and Go, beating a world-champion program in each case.
The company described the breakthrough in the journal Science. The paper describes how AlphaZero quickly learns each game to become the strongest player in history for each, despite starting its training from random play, with no in-built domain knowledge but the basic rules of the game.
This ability to learn each game afresh, unconstrained by the norms of human play, results in a distinctive, unorthodox, yet creative and dynamic playing style. Chess Grandmaster Matthew Sadler and Women’s International Master Natasha Regan, who have analysed thousands of AlphaZero’s chess games for their forthcoming book Game Changer (New in Chess, January 2019), say its style is unlike any traditional chess engine.” It’s like discovering the secret notebooks of some great player from the past,” says Matthew.
Traditional chess engines – including the world computer chess champion Stockfish and IBM’s ground-breaking Deep Blue – rely on thousands of rules and heuristics handcrafted by strong human players that try to account for every eventuality in a game. Shogi programs are also game specific, using similar search engines and algorithms to chess programs.
AlphaZero takes a totally different approach, replacing these hand-crafted rules with a deep neural network and general purpose algorithms that know nothing about the game beyond the basic rules.
To learn each game, an untrained neural network plays millions of games against itself via a process of trial and error called reinforcement learning. At first, it plays completely randomly, but over time the system learns from wins, losses, and draws to adjust the parameters of the neural network, making it more likely to choose advantageous moves in the future. The amount of training the network needs depends on the style and complexity of the game, taking approximately 9 hours for chess, 12 hours for shogi, and 13 days for Go.
Some of its moves, such as moving the King to the centre of the board, go against shogi theory and - from a human perspective - seem to put AlphaZero in a perilous position. But it remains in control of the board. Its unique playing style shows us that there are new possibilities for the game."
The trained network is used to guide a search algorithm – known as Monte-Carlo Tree Search (MCTS) – to select the most promising moves in games. For each move, AlphaZero searches only a small fraction of the positions considered by traditional chess engines. In Chess, for example, it searches only 60 thousand positions per second in chess, compared to roughly 60 million for Stockfish.
Each program ran on the hardware for which they were designed. Stockfish and Elmo used 44 CPU cores (as in the TCEC world championship), whereas AlphaZero and AlphaGo Zero used a single machine with 4 first-generation TPUs and 44 CPU cores. A first generation TPU is roughly similar in inference speed to commodity hardware such as an NVIDIA Titan V GPU, although the architectures are not directly comparable. All matches were played using time controls of three hours per game, plus an additional 15 seconds for each move.
In each evaluation, AlphaZero beat its opponent:
- In chess, AlphaZero defeated the 2016 TCEC (Season 9) world champion Stockfish, winning 155 games and losing just six games out of 1,000. To verify the robustness of AlphaZero, the system also played a series of matches that started from common human openings. In each opening, AlphaZero defeated Stockfish.
- Deepmind also put teh system to play a match that started from the set of opening positions used in the 2016 TCEC world championship, along with a series of additional matches against the most recent development version of Stockfish, and a variant of Stockfish that uses a strong opening book. In all matches, AlphaZero won.
- In shogi, AlphaZero defeated the 2017 CSA world champion version of Elmo, winning 91.2% of games.
- In Go, AlphaZero defeated AlphaGo Zero, winning 61% of games.
However, it was the style in which AlphaZero plays these games that players may find most fascinating. In Chess, for example, AlphaZero independently discovered and played common human motifs during its self-play training such as openings, king safety and pawn structure. But, being self-taught and therefore unconstrained by conventional wisdom about the game, it also developed its own intuitions and strategies adding a new and expansive set of exciting and novel ideas that augment centuries of thinking about chess strategy.
"Chess has been used as a Rosetta Stone of both human and machine cognition for over a century. AlphaZero renews the remarkable connection between an ancient board game and cutting-edge science by doing something extraordinary." said Garry Kasparov, former World Chess Champion.
The first thing that players will notice is AlphaZero's style, says Matthew Sadler – “the way its pieces swarm around the opponent’s king with purpose and power”. Underpinning that, he says, is AlphaZero’s highly dynamic game play that maximises the activity and mobility of its own pieces while minimising the activity and mobility of its opponent’s pieces. Counterintuitively, AlphaZero also seems to place less value on “material”, an idea that underpins the modern game where each piece has a value and if one player has a greater value of pieces on the board than the other, then they have a material advantage. Instead, AlphaZero is willing to sacrifice material early in a game for gains that will only be recouped in the long-term.
“Impressively, it manages to impose its style of play across a very wide range of positions and openings,” says Matthew, who also observes that it plays in a very deliberate style from its first move with a “very human sense of consistent purpose”.
“Traditional engines are exceptionally strong and make few obvious mistakes, but can drift when faced with positions with no concrete and calculable solution,” he says. “It's precisely in such positions where ‘feeling’, ‘insight’ or ‘intuition’ is required that AlphaZero comes into its own."
This unique ability, not seen in other traditional chess engines, has already been harnessed to give chess fans fresh insight and commentary on the recent World Chess Championship match between Magnus Carlsen and Fabiano Caruana and will be explored further in Game Changer. “It was fascinating to see how AlphaZero's analysis differed from that of top chess engines and even top grandmaster play,” says Natasha Regan. "AlphaZero could be a powerful teaching tool for the whole community."
But AlphaZero is about more than chess, shogi or Go. To create intelligent systems capable of solving a wide range of real-world problems we need them to be flexible and generalise to new situations. While there has been some progress towards this goal, it remains a major challenge in AI research with systems capable of mastering specific skills to a very high standard, but often failing when presented with even slightly modified tasks.