Researchers at Alphabet's DeepMind have developed Agent57, a deep reinforcement learning (RL) agent that outperforms the standard human benchmark on 57 Atari games.
The Atari57 suite of games is a long-standing benchmark to gauge agent performance across a wide range of tasks. This benchmark was proposed to test general competency of RL algorithms. Previous work has achieved good average performance by doing outstandingly well on many games of the set, but very poorly in several of the most challenging games.
Researchers have developed Agent57, the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games. Agent57 combines an algorithm for efficient exploration with a meta-controller that adapts the exploration and long vs. short-term behaviour of the agent.
To achieve this result, the researchers train a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative. They propose an adaptive mechanism to choose which policy to prioritize throughout the training process.
Additionally, the researchers utilize a novel parameterization of the architecture that allows for more consistent and stable learning.
DeepMind's Agent57 builds on our previous agent Never Give Up, and instantiates an adaptive meta-controller that helps the agent to know when to explore and when to exploit, as well as what time-horizon it would be useful to learn with. A wide range of tasks will naturally require different choices of both of these trade-offs, therefore the meta-controller provides a way to dynamically adapt such choices.
However, while this enabled Agent57 to achieve strong general performance, it takes a lot of computation and time; the data efficiency can certainly be improved.
Additionally, this agent shows better 5th percentile performance on the set of Atari57 games. This by no means marks the end of Atari research, not only in terms of data efficiency, but also in terms of general performance.
The researchers offer two views on this: firstly, analyzing the performance among percentiles gives them new insights on how general algorithms are. While Agent57 achieves strong results on the first percentiles of the 57 games and holds better mean and median performance than previous agents NGU or R2D2, it could still obtain a higher average performance.
Secondly, all current algorithms are far from achieving optimal performance in some games. To that end, key improvements to use might be enhancements in the representations that Agent57 uses for exploration, planning, and credit assignment.