Research
The new foundation agent learns to use different robotic arms, solves tasks from just 100 demonstrations, and improves from self-generated data.
Robots are quickly becoming part of our daily lives, but they are often only programmed to perform specific tasks correctly. Although harnessing recent advances in AI could lead to robots that could help in many other ways, progress in building general-purpose robots is slower, in part because of the time required to collect real training data.
Our latest article introduces a self-improved AI agent for robotics, RoboCat, that learns to perform various tasks in different arms and then automatically generates new training data to improve its technique.
Previous research has explored how to develop robots that can learn to multitask at scale And combine understanding of language patterns with real-world capabilities of an assistant robot. RoboCat is the first agent to solve and adapt to several tasks through different real robots.
RoboCat learns much faster than other state-of-the-art models. It can take on a new task with just 100 demonstrations because it draws on a large and diverse dataset. This capability will help accelerate robotics research because it reduces the need for human-supervised training and is an important step toward creating a general-purpose robot.
How RoboCat is improving
RoboCat is based on our multimodal model Cat (Spanish for “cat”), which can process language, images, and actions in simulated and physical environments. We combined Gato’s architecture with a large training dataset consisting of image sequences and actions of various robot arms solving hundreds of different tasks.
After this first training cycle, we launched RoboCat into a “personal development” training cycle with a set of new tasks. Learning each new task followed five steps:
- Collect 100 to 1,000 demonstrations of a new task or robot, using a human-controlled robotic arm.
- Fine-tune RoboCat on this new task/arm, creating a specialized derivative agent.
- The derived agent trains on this new task/arm an average of 10,000 times, generating more training data.
- Incorporate demo data and self-generated data into RoboCat’s existing training dataset.
- Train a new version of RoboCat on the new training dataset.
RoboCat’s training cycle, boosted by its ability to autonomously generate additional training data.
The combination of all this training means that the latest RoboCat is based on a dataset of millions of trajectories, from real and simulated robotic arms, including self-generated data. We used four different types of robots and numerous robotic arms to collect vision-based data representing the tasks RoboCat would be trained to do.
RoboCat learns from a wide range of data types and training tasks: videos of a real robotic arm picking up gears, a simulated arm stacking blocks, and RoboCat using a robotic arm to pick up a cucumber.
Learn to use new robotic arms and solve more complex tasks
Thanks to RoboCat’s diverse training, he learned how to use different robotic arms in just a few hours. While he had been trained on arms with two-finger grippers, he was able to adapt to a more complex arm with a three-finger gripper and twice as many controllable inputs.
LEFT: A new robotic arm RoboCat learned to control
RIGHT: Video of RoboCat using the arm to pick up gears
After observing 1,000 human-controlled demonstrations, collected in just a few hours, RoboCat was able to steer this new arm with enough dexterity to shift gears successfully 86 percent of the time. With the same level of demonstration, it could adapt to solving tasks combining precision and understanding, such as removing the correct fruit from a bowl and solving a shape-matching puzzle, necessary for more complex control.
Examples of tasks that RoboCat can adapt to solving after 500-1000 demonstrations.
The improving GP
RoboCat has a virtuous cycle of training: the more it learns new tasks, the more it manages to learn additional new tasks. The initial version of RoboCat succeeded only 36% of the time on novel tasks, after learning from 500 demonstrations per task. But the latest RoboCat, which had trained on a greater variety of tasks, more than doubled this success rate on the same tasks.
The large difference in performance between the initial RoboCat (one training cycle) and the final version (deep and diverse training, including self-improvement) after both versions were refined over 500 demonstrations of new tasks.
These improvements are due to RoboCat’s increasing breadth of experience, similar to how people develop a more diverse range of skills as they deepen their learning in a given area. RoboCat’s ability to learn skills independently and improve rapidly, especially when applied to different robotic devices, will help pave the way for a new generation of more useful general-purpose robotic agents.