ChatGPT and other deep generative models turn out to be strange imitations. These AI models can produce poems, complete symphonies, and create new videos and images by automatically learning from millions of examples of previous works. These extremely powerful and versatile tools excel at generating new content that is unlike anything they’ve seen before.
But as MIT engineers say in a new study, similarity isn’t enough if you want to truly innovate in engineering tasks.
“Deep generative models (DGMs) are very promising, but also inherently flawed,” says study author Lyle Regenwetter, a graduate student in mechanical engineering at MIT. “The goal of these models is to mimic a set of data. But as engineers and designers, we often don’t want to create a design that already exists.
He and his colleagues argue that if mechanical engineers want AI’s help in generating new ideas and designs, they will first need to refocus those models beyond “statistical similarity.”
“The performance of many of these models is explicitly tied to how statistically similar a generated sample is to what the model has already seen,” says co-author Faez Ahmed, an assistant professor of mechanical engineering at MIT. “But when it comes to design, being different can be important if you want to innovate.”
In their study, Ahmed and Regenwetter reveal the pitfalls of deep generative models when tasked with solving engineering design problems. In a case study of bicycle frame design, the team shows that these designs eventually generate new frames that mimic previous designs but fail in terms of performance and technical requirements.
When researchers presented DGMs with the same bicycle frame problem that they had specifically designed with engineering-driven goals in mind, rather than simple statistical similarity, these models produced more innovative and better-performing frames.
The team’s results show that similarity-focused AI models don’t really translate when applied to engineering problems. But, as the researchers also point out in their study, with careful planning of the right actions for the task, AI models could be an effective design “co-pilot.”
“It’s about how AI can help engineers create innovative products more efficiently and quickly,” says Ahmed. “To do this, we must first understand the requirements. This is a step in that direction.
The team is new study recently appeared online and will appear in the December print edition of the journal Computer Aided Design. The research is a collaboration between computer scientists from the MIT-IBM Watson AI Lab and mechanical engineers from MIT’s DeCoDe Lab. Co-authors of the study include Akash Srivastava and Dan Gutreund of the MIT-IBM Watson AI Lab.
Define a problem
As Ahmed and Regenwetter write, DGMs are “powerful learners, with unparalleled capacity” to process enormous amounts of data. DGM is a general term for any machine learning model trained to learn the distribution of data and then use it to generate new, statistically similar content. The very popular ChatGPT is a type of deep generative model known as a large language model, or LLM, which integrates natural language processing capabilities into the model to allow the application to generate realistic images and speech in response to conversational queries. Other popular models for image generation include DALL-E and Stable Diffusion.
Due to their ability to learn from data and generate realistic samples, DGMs are increasingly applied in several engineering fields. Designers used in-depth generative models to design new aircraft frames, metamaterial designs, and optimal geometries for bridges and cars. But for the most part, the models imitated existing designs, without improving the performance of existing designs.
“Designers who work with DGMs kind of miss this icing on the cake: adjusting the training objective of the model to focus on the design requirements,” says Regenwetter. “So people end up generating designs that are very similar to the data set.”
In the new study, he describes the main pitfalls of applying DGMs to engineering tasks and shows that the fundamental purpose of standard DGMs does not take into account specific design requirements. To illustrate this, the team discusses a simple case of bicycle frame design and demonstrates that problems can arise even in the initial learning phase. As a model learns from thousands of existing bicycle frames of different sizes and shapes, it may assume that two frames of similar dimensions have similar performance, when in fact a small disconnection in one frame is too small to be recorded as a significant difference in statistical similarity. metrics – makes the frame much weaker than the other visually similar frame.
Beyond “vanilla”

Credit: Courtesy of the researchers
The researchers continued the bicycle example to see what designs a DGM would actually generate after learning from existing designs. They first tested a conventional “vanilla” generative adversarial network, or GAN – a model widely used in image and text synthesis and designed simply to generate statistically similar content. They trained the model on a dataset of thousands of bicycle frames, including commercially manufactured models and less conventional one-off frames designed by hobbyists.
Once the model learned data, the researchers asked it to generate hundreds of new bicycle frames. The model produced realistic designs that resembled existing frames. But none of the models showed a significant improvement in performance, and some were even a bit inferior, with heavier and less structurally sound frames.
The team then performed the same test with two other DGMs specifically designed for engineering tasks. The first model is one that Ahmed previously developed to generate high-performance airfoil designs. He built this model to prioritize statistical similarity as well as functional performance. When applied to the bicycle frame, this model generated realistic designs that were also lighter and stronger than existing designs. But it also produced physically “invalid” frames, with components that didn’t quite fit or overlapped in ways that were physically impossible.
“We saw designs that were significantly better than the data set, but also designs that were geometrically incompatible because the model was not focused on meeting the design constraints,” says Regenwetter.
The last model the team tested was one that Regenwetter built to generate new geometric structures. This model was designed with the same priorities as previous models, with the addition of design constraints and prioritizing physically viable frames, for example, without disconnections or overlapping bars. The latter model produced the best performing designs, which were also physically feasible.
“We found that when a model goes beyond statistical similarity, it can come up with designs that are better than those that already exist,” says Ahmed. “This is proof of what AI can do, if explicitly trained on a design task.”
For example, if DGMs can be built with other priorities, such as performance, design constraints, and novelty, Ahmed predicts that “many areas of engineering, such as molecular design and civil infrastructure, would benefit greatly. By highlighting the potential pitfalls of statistical similarity alone, we hope to inspire new avenues and strategies in generative AI applications outside of multimedia.