Image credits: Toyota Research Institute
(A version of this article first appeared in TechCrunch’s robotics newsletter Actuator. Subscribe here.)
The topic of generative AI comes up frequently in my newsletter, Actuator. I admit that I was a little hesitant to dwell more on the subject a few months ago. Anyone who has been reporting on technology for as long as I have has experienced countless hype cycles and been burned before. Reporting on technology requires a healthy dose of skepticism, hopefully tempered by some enthusiasm about what can be done.
This time around, it seemed like generative AI was waiting in the wings, biding its time, waiting for the inevitable crypto crater. As the blood drained from this category, projects like ChatGPT and DALL-E were waiting, ready to be the center of breathtaking reporting, hope, criticism, doom and gloom and all the different stages Kübler-Rossian from the tech hype bubble.
Those who follow my work know that I have never been particularly optimistic about cryptocurrencies. Things are different with generative AI, however. For starters, there is near-universal consensus that artificial intelligence and machine learning will play more centralized roles in our lives in the future.
Smartphones offer an excellent overview here. Computational photography is a topic I write about quite regularly. Great progress has been made in this area in recent years, and I think many manufacturers have finally found a good balance between hardware and software when it comes to both improving the end product and lower the entry bar. Google, for example, does some really impressive tricks with editing features like Better Grip and Magic Eraser.
Sure, these are cool tricks, but they’re also useful, rather than being features for their own sake. However, going forward, the real trick will be integrating them seamlessly into the experience. With ideal future workflows, most users will have little to no idea of what’s going on behind the scenes. They will just be happy that it works. It’s the classic Apple playbook.
Generative AI delivers a similar “wow” effect right out of the gate, which is another way it differs from its hype cycle predecessor. When your less tech-savvy parent can sit at a computer, type a few words into a dialog box, and then watch the black box spit out paintings and short stories, there’s not much conceptualization required. This is a big part of why this has grown so quickly: most of the time, when ordinary people are presented with cutting-edge technology, it forces them to visualize what it might look like five years from now. or ten years.
With ChatGPT, DALL-E, etc., you can experience it right now. Of course, the other side of the coin is the difficulty of moderating expectations. Although people are inclined to endow robots with human or animal intelligence, without a fundamental understanding of AI it is easy to project intentionality here. But that’s how things are now. We start with a catchy title and hope people stick around long enough to read the machinations behind it.
Spoiler alert: nine times out of 10 they won’t, and suddenly we spend months and years trying to bring things back to reality.
One of the best things about my job is the ability to discuss these things with people who are much smarter than me. They take the time to explain things and I hope I do a good job translating them for readers (some attempts are more successful than others).
Once it became clear that generative AI had an important role to play in the future of robotics, I found ways to incorporate the questions into conversations. I find that most people in the field agree with the statement in the previous sentence, and it’s fascinating to see the magnitude of impact they think it will have.
For example, in my recent conversation with Marc Raibert and Gill Prattthe latter explained the role that generative AI plays in his approach to learning robots:
We figured out how to do something, which is to use modern generative AI techniques that allow for human demonstration of position and force to essentially teach a robot from just a handful of examples . The code is not changed at all. This is based on what is called diffusion policy. This is work that we did in collaboration with Columbia and MIT. So far we have taught 60 different skills.
Last weekWhen I asked Deepu Talla, vice president and general manager of Embedded Computing and Edge Computing at Nvidia, why the company thought generative AI was more than a fad, he replied:
I think that speaks in the results. You can already see the improvement in productivity. He can compose an email for me. That’s not entirely true, but I don’t need to start from scratch. That gives me 70%. There are some obvious things that you can already see that are definitely working better than before. Summarizing something is not perfect. I’m not going to let him read and summarize for me. So, you can already see some signs of improved productivity.
Meanwhile, during my last conversation with Daniela Rusthe director of MIT CSAIL explained how researchers are using generative AI to design robots:
It turns out that generative AI can be very powerful in solving even motion planning problems. You can get much faster solutions and much smoother and more humane control solutions than with model predictive solutions. I think this is very powerful, because the robots of the future will be much less robotic. They will be much more fluid and more human in their movements.
We also used generative AI for the design. It’s very powerful. It’s also very interesting, because it’s not just about generating models for robots. You have to do something else. It can’t just be a matter of generating a model based on data. Machines must make sense in the context of physics and the physical world. For this reason, we connect them to a physics-based simulation engine to ensure that the designs meet the required constraints.
This week, a team from Northwestern University revealed his own research in the design of AI-generated robots. The researchers showed how they designed a “robot capable of walking successfully in just a few seconds”. There’s not much to see, as it stands, but it’s pretty easy to see how, with further research, this approach could be used to create more complex systems.
“We discovered a very fast, AI-driven design algorithm that bypasses evolutionary bottlenecks, without resorting to the biases of human designers,” said Sam Kriegman, head of the research. “We told the AI that we wanted a robot that could walk across the earth. Then we simply pressed a button and presto! In the blink of an eye, he generated the blueprint for a robot that looks nothing like any animal that has ever walked on earth. I call this process “instant evolution.”
It was the AI program’s choice to put legs on the little squishy robot. “It’s interesting because we didn’t tell the AI that a robot had to have legs,” Kriegman added. “We have rediscovered that legs are a good way to move around on land. Legged locomotion is, in fact, the most efficient form of land movement.
“From my perspective, generative AI and physical/robotic automation are what are going to change everything we know about life on Earth,” Formant founder and CEO Jeff Linnell told me this week . “I think we are all aware that AI exists and we expect that every one of our jobs, every business and every student will be impacted. I think it is in symbiosis with robotics. You will not need to program a robot. You will speak to the robot in English, request an action and it will then be understood. It’ll take a minute for this.
Prior to Formant, Linnell founded and served as CEO of Bot & Dolly. The San Francisco-based company, best known for its work on Gravity, was acquired by Google in 2013 as the software giant set its sights on accelerating the industry (best laid plans, etc.). The executive tells me that the main lesson to learn from this experience is that it’s all about the software (given the arrival of Intrinsic and Everyday Robots in DeepMind, I tend to say that Google agrees) .