Reinforcement learning provides a conceptual framework for autonomous agents to learn from experience, in the same way that one might train a pet with treats. But practical applications of reinforcement learning are often far from natural: instead of using learning through trial and error by actually attempting the desired task, reinforcement learning applications Typical training methods use a separate (usually simulated) training phase. For example, AlphaGo didn’t learn to play Go by playing against thousands of humans, but rather by playing against himself in a simulation. Although this type of simulated training is attractive for games where the rules are well known, its application to real-world areas such as robotics may require a range of complex approaches, such as using simulated dataor instrument real-world environments in various ways to make training feasible in laboratory conditions. Can we instead design reinforcement learning systems for robots that allow them to learn directly “on the job”, while carrying out the task asked of them? In this blog post we will discuss ReLMM, a system we developed that learns to clean a room directly with a real robot via continuous learning.
We evaluate our method on different tasks of varying difficulty. The top left task features uniform white spots to collect without obstacles, while other rooms contain objects of various shapes and colors, obstacles that increase navigation difficulty and obscure objects, and patterned rugs that make it difficult to see objects on the ground.
To enable “on the job” training in the real world, the difficulty of gaining more experience is prohibitive. If we can facilitate real-world training, making the data collection process more autonomous without requiring human monitoring or intervention, we can benefit more from the simplicity of agents learning from experience. In this work, we design an “on-the-job” mobile robot training system for cleaning by learning to grasp objects in different rooms.
People aren’t born one day and have a job interview the next. There are many levels of tasks that people learn before applying for a job, because we start with the easiest ones and build on them. In ReLMM, we use this concept by allowing robots to learn common reusable skills, such as grasping, by first encouraging the robot to prioritize training these skills before learning later skills, such as navigation. Learning this way has two advantages for robotics. The first benefit is that when an agent focuses on learning a skill, they are more effective in collecting data on the state’s local distribution for that skill.
This is illustrated in the figure above, where we have assessed the amount of priority grip experience needed to result in effective mobile manipulation training. The second advantage of a multi-level learning approach is that we can inspect models trained for different tasks and ask them questions such as “can you understand something now”, which is useful for training navigation which we describe next.
Training on this multi-level policy was not only more effective than learning both skills at the same time, but it also allowed the seizing controller to inform the navigation policy. Have a model that estimates the uncertainty of its success (Ours above) can be used to enhance navigation exploration by skipping areas without items to grab, unlike No uncertainty bonus who does not use this information. The model can also be used to relabel the data during training, so that in the unfortunate case where the grip model failed to grasp an object within its reach, the grip policy can still provide a signal by indicating that an object was there but the grip pattern was there. politics has not yet learned to understand it. Additionally, learning modular models has technical advantages. Modular training allows for easier-to-learn skills to be reused and can enable intelligent systems to be built one part at a time. This is beneficial for many reasons, including assessing and understanding security.
Many robotic tasks we see today can be solved with varying levels of success using hand-designed controllers. For our room cleaning task, we designed a hand-designed controller that locates objects using image clustering and turns to the nearest detected object at each step. This expertly designed controller works great on visually protruding ball socks and takes reasonable paths around obstacles. but it cannot quickly learn an optimal path to collect items and it has difficulty dealing with visually diverse pieces.. As shown in Video 3 below, the scripted policy is distracted by the white patterned carpet while trying to locate more white objects to grab.
1)
2)
3)
4)
We show a comparison between (1) our policy at the start of training (2) our policy at the end of training (3) the scripted policy. In (4), we can see the robot’s performance improve over time, and eventually exceed the scripted policy in quickly collecting objects in the room.
Given that we can hire experts to code this hand-designed controller, what is the purpose of learning? A significant limitation of hand-made controllers is that they are tuned for a particular task, such as grasping white objects. When various objects are introduced, which differ in color and shape, the initial agreement may no longer be optimal. Rather than requiring additional manual engineering, our learning-based method is able to adapt to various tasks by collecting its own experience.
However, the more important lesson is that even if the hand-designed controller is capable, the learning agent will eventually outperform it given enough time. This learning process is itself autonomous and takes place while the robot carries out its work, which makes it relatively inexpensive. This shows the ability of learning agents, which can also be seen as developing a general way to perform an “expert manual tuning” process for any type of task. Learning systems have the ability to create the entire robot control algorithm and are not limited to adjusting a few parameters in a script. The key step in this work is enabling these real-world learning systems to autonomously collect the data necessary for successful learning methods.
This article is based on the paper “Fully autonomous reinforcement learning in the real world with applications to mobile manipulation”, presented at CoRL 2021. You can find more details in our paperon our website and the on the video. We provide coded to reproduce our experiments. We thank Sergey Levine for his valuable comments on this blog post.