DeepMind's Alphastar Achieved a Grandmaster Level at StarCraft II

Alphabet's DeepMind has created the first artificial intelligence to reach the top league of one of the most popular esport video games, the StarCraft II.

The team used general-purpose machine learning techniques – including neural networks, self-play via reinforcement learning, multi-agent learning, and imitation learning – to learn directly from game data with general purpose techniques. Using the advances described in this paper, AlphaStar was ranked above 99.8% of active players on Battle.net, and achieved a Grandmaster level for all three StarCraft II races: Protoss, Terran, and Zerg.

In StarCraft II's one-on-one games, two players compete against each other after choosing which alien race to be. Each of the three options - Zerg, Protoss and Terran - has different abilities.

Players start with only a few pieces and must gather resources - minerals and gasses - which can be used to make new buildings and create technologies. They can also invest time increasing their number of worker units.

Gamers can only see a small section of the map at a time, and they can only point the in-game "camera" to an area if some of their units are based there or have travelled to it.

When ready, players can send out scouting parties to reveal their enemy's preparations, or alternatively go straight ahead and launch attacks.

All of this happens in real-time, and players do not take turns to make moves.

As the action picks up pace, gamers typically have to juggle hundreds of units and structures, and make choices that might only pay off minutes later.

Part of the challenge is the huge amount of choice on offer.

At any time, there are up to 100 trillion trillion possible moves, and thousands of such choices must be taken before it becomes apparent who has overwhelmed the others' buildings and won.

Researchers at DeepMind, are trying to understand the potential – and limitations – of open-ended learning, which enables them to develop robust and flexible agents that can cope with complex, real-world domains. Games like StarCraft are an excellent training ground to advance these approaches, as players must use limited information to make dynamic and difficult decisions that have ramifications on multiple levels and timescales.

Thanks to advances in imitation learning, reinforcement learning, and the League, the resrarchers were able to train AlphaStar Final, an agent that reached Grandmaster level at the full game of StarCraft II without any modifications. This agent played online anonymously, using the gaming platform Battle.net, and achieved a Grandmaster level using all three StarCraft II races. AlphaStar played using a camera interface, with similar information to what human players would have, and with restrictions on its action rate to make it comparable with human players. The interface and restrictions were approved by a professional player. Ultimately, these results provide strong evidence that general-purpose learning techniques can scale AI systems to work in complex, dynamic environments involving multiple actors. The techniques used to develop AlphaStar will help further the safety and robustness of AI systems in general, and, the researchers hope that may serve to advance their research in real-world domains.