Research
The second-person, top-down views of a BYOL-Explore agent solving DM-HARD-8's Thow-Across level, while pure RL and other basic exploration methods fail to progress on Thow-Across.
Curiosity-driven exploration is the active process of searching for new information to improve the agent's understanding of its environment. Suppose the agent has learned a model of the world that can predict future events given the history of past events. The agent motivated by curiosity can then use the inadequacy of the predictions of the world model as an intrinsic reward to direct its exploration policy towards the search for new information. So the agent can then use this new information to improve the world model itself to allow it to make better predictions. This iterative process can allow the agent to eventually explore each new thing in the world and use this information to construct an accurate world model.
Inspired by the success of start your own latent (BYOL) – which was applied in computer vision, learning graphic representationAnd learning representations in RL – we propose BYOL-Explore: a conceptually simple but general, curiosity-driven AI agent for solving difficult exploration tasks. BYOL-Explore learns a representation of the world by predicting its own future representation. Then, it uses the representation-level prediction error as an intrinsic reward to form a curiosity-driven policy. Therefore, BYOL-Explore learns both a world representation, world dynamics, and a curiosity-driven exploration policy, simply by optimizing the prediction error at the representation level.
Comparison between BYOL-Explore, Random Network Distillation (RND), Intrinsic Curiosity Module (ICM) and pure RL (no intrinsic reward), in terms of average human normalized capped score (CHNS).
Despite the simplicity of its design, when applied to DM-DUR-8 suite of challenging, visually complex and challenging 3D exploration tasks, BYOL-Explore outperforms standard curiosity-based exploration methods such as Network random distillation (RND) and Intrinsic Curiosity Module (ICM), in terms of the average human normalized capped score (CHNS), measured across all tasks. Remarkably, BYOL-Explore achieved this performance using a single network trained simultaneously on all tasks, whereas previous work was limited to the single-task setting and could only make significant progress on these tasks if they were accompanied by demonstrations by human experts.
Further proof of its generality, BYOL-Explore achieves superhuman performance in the ten most difficult explorations. Atari gameswhile having a simpler design than other competing agents, such as Agent57 And Go exploring.
Comparison between BYOL-Explore, Random Network Distillation (RND), Intrinsic Curiosity Module (ICM) and pure RL (no intrinsic reward), in terms of average human normalized capped score (CHNS).
In the future, we can generalize BYOL-Explore to highly stochastic environments by learning a probabilistic world model that could be used to generate trajectories of future events. This could allow the agent to model the possible stochasticity of the environment, avoid stochastic traps and plan exploration.