Anyone who’s ever tried to fit a family-sized amount of luggage into a sedan’s sized trunk knows it’s a difficult problem. Robots also struggle to perform dense packaging tasks.
For the robot, solving the packing problem involves satisfying many constraints, such as stacking luggage so that suitcases do not fall out of the trunk, heavy objects are not placed on lighter ones, and collisions between the robotic arm and the bumper of the car. are avoided.
Some traditional methods approach this problem sequentially, by guessing a partial solution that satisfies one constraint at a time, then checking whether other constraints have been violated. With a long sequence of actions to take and a pile of baggage to pack, this process can take a long time.
MIT researchers used a form of generative AI, called a diffusion model, to solve this problem more efficiently. Their method uses a collection of machine learning models, each trained to represent a specific type of constraint. These models are combined to generate global solutions to the packaging problem, taking into account all the constraints at once.
Their method was able to generate effective solutions faster than other techniques, and it produced a greater number of successful solutions in the same amount of time. Importantly, their technique was also able to solve problems related to new combinations of constraints and a larger number of objects, which the models had not seen during training.
Because of this generalizability, their technique can be used to teach robots how to understand and respond to the global constraints of packaging problems, such as the importance of avoiding collisions or the desire for an object to be next to it. another object. Robots trained in this way could be applied to a wide range of complex tasks in various environments, from fulfilling orders in a warehouse to organizing a shelf in someone’s home.
“My vision is to push robots to perform more complex tasks that have many geometric constraints and more continuous decisions to make – these are the kinds of problems service robots face in our unstructured and diverse human environments . With the powerful tool of compositional diffusion models, we can now solve these more complex problems and achieve excellent generalization results,” says Zhutian Yang, a graduate student in electrical and computer engineering and lead author of a article on this new machine learning technique.
His co-authors include MIT graduate students Jiayuan Mao and Yilun Du; Jiajun Wu, assistant professor of computer science at Stanford University; Joshua B. Tenenbaum, professor in the Department of Brain and Cognitive Sciences at MIT and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); Tomás Lozano-Pérez, professor of computer science and engineering at MIT and member of CSAIL; and lead author Leslie Kaelbling, Panasonic Professor of Computer Science and Engineering at MIT and member of CSAIL. The research will be presented at the Robot Learning Conference.
Constraint complications
Problems of continuous constraint satisfaction are particularly difficult for robots. These problems appear in multi-step robot manipulation tasks, such as packing objects into a box or setting a table. They often involve respecting a certain number of constraints, particularly geometric ones, such as avoiding collisions between the robot arm and the environment; physical constraints, such as stacking objects so that they are stable; and qualitative constraints, such as placing a spoon to the right of a knife.
There can be many constraints, and they vary across problems and environments based on object geometry and human-specified requirements.
To effectively solve these problems, MIT researchers developed a machine learning technique called Diffusion-CCSP. Diffusion models learn to generate new data samples that resemble samples from a training dataset by iteratively refining their output.
To do this, diffusion models learn a procedure for making small improvements to a potential solution. Then, to solve a problem, they start with a random and very bad solution, then gradually improve it.

Image: Courtesy of the researchers
For example, imagine randomly placing plates and utensils on a simulated table, allowing them to physically overlap. Non-collision constraints between objects will cause them to push against each other, while qualitative constraints will pull the plate towards the center, align the salad fork and the dinner fork, etc.
Diffusion models are well suited to this type of continuous constraint satisfaction problem, because the influences of multiple models on an object’s pose can be composed to encourage satisfaction of all constraints, Yang explains. By starting from a random initial estimate each time, the models can obtain a diverse set of good solutions.
Work together
For Diffusion-CCSP, the researchers wanted to capture the interconnectivity of constraints. In packaging for example, one constraint may require that a certain object be next to another object, while a second constraint may specify where one of those objects should be located.
Diffusion-CCSP learns a family of diffusion models, one for each constraint type. The models are trained together, so they share certain knowledge, such as the geometry of the objects to be packaged.
The models then work together to find solutions, in this case locations for objects to be placed, that jointly satisfy the constraints.
“We don’t always arrive at a solution the first time. But when you continue to refine the solution and a violation occurs, it should lead you to a better solution. You get advice if there’s a problem,” she says.
Training individual models for each constraint type and then combining them to make predictions significantly reduces the amount of training data required, compared to other approaches.
However, training these models still requires a large amount of data demonstrating the problems being solved. Humans would have to solve every problem with traditional slow methods, which would make the cost of generating such data prohibitive, Yang says.
Instead, the researchers reversed the process by first proposing solutions. They used fast algorithms to generate segmented boxes and insert a diverse set of 3D objects into each segment, ensuring tight packing, stable poses, and collision-free solutions.
“Thanks to this process, data generation is almost instantaneous in simulation. We can create tens of thousands of environments where we know the problems can be solved,” she says.
Trained using this data, the diffusion models work together to determine where objects should be placed by the robotic gripper to complete the wrapping task while meeting all constraints.
They conducted feasibility studies and then demonstrated Diffusion-CCSP with a real robot solving a number of difficult problems, including fitting 2D triangles into a box, packing 2D shapes with spatial relationship constraints, stacking 3D objects with stability constraints and wrapping 3D objects with stability constraints. a robotic arm.
Their method outperformed other techniques in many experiments, generating a greater number of efficient solutions that were both stable and collision-free.
In the future, Yang and his collaborators want to test Diffusion-CCSP in more complex situations, such as with robots that can move around a room. They also want to allow Diffusion-CCSP to tackle problems in different areas without needing to retrain on new data.
“Diffusion-CCSP is a machine learning solution that leverages existing powerful generative models,” says Danfei Xu, assistant professor in the School of Interactive Computing at the Georgia Institute of Technology and research scientist at NVIDIA AI, who did not participate. with this work. “It can quickly generate solutions that simultaneously satisfy multiple constraints by composing known individual constraint models. Although still in the early stages of development, ongoing advances in this approach promise to enable more efficient, safer, and more reliable autonomous systems in a variety of applications.
This research was supported, in part, by the National Science Foundation, Air Force Office of Scientific Research, Office of Naval Research, MIT-IBM Watson AI Lab, MIT Quest for Intelligence, Center for Brains, Minds and Machines, Boston Dynamics Artificial Intelligence Institute, Stanford Institute for Human-Centered Artificial Intelligence, Analog Devices, JPMorgan Chase and Co. and Salesforce.