Using human and animal movements to teach robots to dribble a ball and simulated humanoid characters to carry boxes and play soccer
Five years ago we took on the challenge of teaching a fully articulated humanoid character navigate obstacle courses. This demonstrated what reinforcement learning (RL) can achieve through trial and error, but also highlighted two challenges in solving incarnated intelligence:
- Reuse previously learned behaviors: A significant amount of data was needed for the agent to “take off”. Without any initial knowledge of how much force to apply to each of his joints, the agent began with random contractions of his body and quickly fell to the ground. This problem could be alleviated by reusing previously learned behaviors.
- Idiosyncratic behaviors: When the agent finally learned to navigate the obstacle courses, he did so in an unnatural way (although fun) movement models that would be impractical for applications such as robotics.
Here we describe a solution to both challenges called neural probabilistic motor primitives (NPMPs), involving guided learning with movement patterns derived from humans and animals, and discuss how this approach is used in our Humanoid football paper, published today in Science Robotics.
We also discuss how this same approach allows full-body manipulation of a humanoid from vision, such as a humanoid carrying an object, and robotic control in the real world, such as a robot dribbling a ball.
Distill data into controllable engine primitives using NPMP
An NPMP is a general-purpose motor control module that translates short-horizon motor intentions into low-level control signals. trained offline Or via RL by mimicking motion capture (MoCap) data, recorded with trackers on humans or animals performing movements of interest.
The model has two parts:
- An encoder that takes a future trajectory and compresses it into a motor intention.
- A low-level controller that produces the next action given the agent’s current state and this motor intention.
After training, the low-level controller can be reused to learn new tasks, where a high-level controller is optimized to directly produce the motor’s intentions. This allows for efficient exploration – since consistent behaviors are produced, even with randomly sampled motor intentions – and constrains the final solution.
Emerging team coordination in humanoid football
Football was a long-standing challenge for research into embodied intelligence, requiring individual skills and coordinated team play. In our latest work, we used an NPMP as a prerequisite to guide motor skill learning.
The result was a team of players who progressed from learning ball chasing skills to finally learning how to coordinate. Previously, in a study with simple embodiments, we showed that coordinated behaviors can emerge within competing teams. NPMP allowed us to observe a similar effect but in a scenario that required significantly more advanced motor control.
Our officers have learned skills such as agile locomotion, overtaking and division of labor, as demonstrated by a range of statistics, including metrics used in real world sports analytics. Players demonstrate both agile high-frequency motor control and long-term decision-making that involves anticipation of teammates’ behaviors, leading to coordinated team play.
Whole-body manipulation and cognitive tasks using vision
Learning to interact with objects using the arms is another difficult control challenge. NPMP may also enable this type of whole-body manipulation. With a small amount of MoCap box interaction data, we are able to train an agent to carry a box from one location to another, using egocentric vision and with only a sparse reward signal:
Likewise, the agent can be taught to catch and throw balls:
Thanks to NPMP we can also address maze tasks involving locomotion, perception and memory:
Safe and efficient control of real-world robots
NPMP can also help control real robots. Having well-regulated behavior is essential for activities like walking on uneven terrain or handling fragile objects. Jerky movements can damage the robot itself or its surroundings, or at least drain its battery. Therefore, significant effort is often invested in designing learning goals that allow a robot to do what we want while behaving safely and efficiently.
As an alternative, we investigated whether the use priors derived from biological movement can give us well-regularized, natural-looking, and reusable movement skills for legged robots, such as walking, running, and turning, that can be deployed to real-world robots.
Starting with MoCap data from humans and dogs, we adapted the NPMP approach to train skills and controllers in simulation that can then be deployed on real humanoid (OP3) and quadruped (ANYmal B) robots, respectively. This allowed robots to be directed by a user via a joystick or to dribble a ball to a target location in a natural and robust manner.
Advantages of using neural probabilistic motor primitives
In summary, we used the NPMP skills model to learn complex tasks with humanoid characters in simulations and real-world robots. NPMP groups low-level motor skills in a reusable way, making it easier to learn useful behaviors that would be difficult to discover through unstructured trial and error. By using motion capture as a source of prior information, it biases the learning of motor control towards that of naturalistic movements.
NPMP allows embodied agents to learn faster using RL; learn more naturalistic behaviors; learn safer, more efficient and stable behaviors suitable for real-world robotics; and combine whole-body motor control with longer-term cognitive skills, such as teamwork and coordination.
Learn more about our work: