Using human and animal movements to teach robots to dribble a ball and simulated humanoid characters to carry boxes and play soccer
Humanoid character learning to complete an obstacle course through trial and error, which can lead to idiosyncratic solutions. Heess et al. “Emergence of locomotion behaviors in rich environments” (2017).
Five years ago we took on the challenge of teaching a fully articulated humanoid character navigate obstacle courses. This demonstrated what reinforcement learning (RL) can achieve through trial and error, but also highlighted two challenges to solving incarnated intelligence:
- Reuse previously learned behaviors: A significant amount of data was needed for the agent to “take off”. Without any initial knowledge of how much force to apply to each of his joints, the agent began with random contractions of his body and quickly fell to the ground. This problem could be alleviated by reusing previously learned behaviors.
- Idiosyncratic behaviors: When the agent finally learned to navigate the obstacle courses, he did so in an unnatural way (although fun) movement models that would be impractical for applications such as robotics.
Here we describe a solution to both challenges called neural probabilistic motor primitives (NPMPs), involving guided learning with movement patterns derived from humans and animals, and discuss how this approach is used in our Humanoid football paper, published today in Science Robotics.
We also discuss how this same approach allows full-body manipulation of a humanoid from vision, such as a humanoid carrying an object, and robotic control in the real world, such as a robot dribbling a ball.
Distill data into controllable engine primitives using NPMP
An NPMP is a general-purpose motor control module that translates short-horizon motor intentions into low-level control signals. trained offline Or via RL by mimicking motion capture (MoCap) data, recorded with trackers on humans or animals performing movements of interest.
An agent learning to imitate a MoCap trajectory (in gray).
The model has two parts:
- An encoder that takes a future trajectory and compresses it into a motor intention.
- A low-level controller that produces the next action given the agent's current state and this motor intention.
Our NPMP model first distills the reference data into a low-level controller (left). This low-level controller can then be used as a plug-and-play motor control module on a new task (right).
After training, the low-level controller can be reused to learn new tasks, where a high-level controller is optimized to directly produce the motor's intentions. This allows for efficient exploration – since consistent behaviors are produced, even with randomly sampled motor intentions – and constrains the final solution.
Emerging team coordination in humanoid football
Football was a long-standing challenge for research into embodied intelligence, requiring individual skills and coordinated team play. In our latest work, we used an NPMP as a prerequisite to guide motor skill learning.
The result was a team of players who progressed from learning ball chasing skills to finally learning how to coordinate. Previously, in a study with simple embodiments, we showed that coordinated behaviors can emerge within competing teams. NPMP allowed us to observe a similar effect but in a scenario that required significantly more advanced motor control.
Agents first imitate the movements of football players to learn an NPMP module (top). Through the NPMP, officers then learn soccer-specific skills (bottom).
Our officers have learned skills such as agile locomotion, overtaking and division of labor, as demonstrated by a range of statistics, including metrics used in real world sports analytics. Players demonstrate both agile high-frequency motor control and long-term decision-making that involves anticipation of teammates' behaviors, leading to coordinated team play.
An agent learning to play soccer competitively using multi-agent RL.
Whole-body manipulation and cognitive tasks using vision
Learning to interact with objects using the arms is another difficult control challenge. NPMP may also enable this type of whole-body manipulation. With a small amount of MoCap box interaction data, we are able to train an agent to carry a box from one place to another, using egocentric vision and with only a sparse reward signal:
With a small amount of MoCap data (top), our NPMP approach can solve a box transportation task (bottom).
Likewise, the agent can be taught to catch and throw balls:
Simulated humanoid catching and throwing a ball.
Thanks to NPMP we can also address maze tasks involving locomotion, perception and memory:
Simulated humanoid collecting blue spheres in a maze.
Safe and efficient control of real-world robots
NPMP can also help control real robots. Having well-regulated behavior is essential for activities like walking on uneven terrain or handling fragile objects. Jerky movements can damage the robot itself or its surroundings, or at least drain its battery. Therefore, significant effort is often invested in designing learning goals that allow a robot to do what we want while behaving safely and efficiently.
As an alternative, we investigated whether the use priors derived from biological motion can give us well-regularized, natural-looking, and reusable movement skills for legged robots, such as walking, running, and turning, that can be deployed to real-world robots.
Starting with MoCap data from humans and dogs, we adapted the NPMP approach to train skills and controllers in simulation that can then be deployed on real humanoid (OP3) and quadruped (ANYmal B) robots, respectively. This allowed robots to be directed by a user via a joystick or to dribble a ball to a target location in a natural and robust manner.
The ANYmal robot's locomotion skills are learned by imitating the MoCap dog.
Locomotion skills can then be repurposed for controllable walking and ball dribbling.
Advantages of using neural probabilistic motor primitives
In summary, we used the NPMP skills model to learn complex tasks with humanoid characters in simulations and real-world robots. NPMP groups low-level motor skills in a reusable way, making it easier to learn useful behaviors that would be difficult to discover through unstructured trial and error. By using motion capture as a source of prior information, it biases the learning of motor control towards that of naturalistic movements.
NPMP allows embodied agents to learn faster using RL; learn more naturalistic behaviors; learn safer, efficient and stable behaviors adapted to real-world robotics; and combine whole-body motor control with longer-term cognitive skills, such as teamwork and coordination.
Learn more about our work: