Research into AI models capable of generalizing, scaling and accelerating science
Next week marks the start of the 11 International conference on representations of learning (ICLR), which will take place from May 1 to 5 in Kigali, Rwanda. This will be the first major artificial intelligence (AI) conference held in Africa and the first in-person event since the start of the pandemic.
Researchers from around the world will come together to share their cutting-edge work in deep learning spanning the fields of AI, statistics and data science, as well as applications such as machine vision, gaming and robotics. We are proud to support the conference as a Diamond Sponsor and DEI Champion.
The DeepMind teams are presenting 23 articles this year. Here are some highlights:
Open Questions on the Path to AGI
Recent advances have shown the incredible performance of AI in text and image, but more research is needed before the systems can generalize across domains and scales. This will be a crucial step on the path to developing artificial general intelligence (AGI) as a transformative tool in our daily lives.
We present a new approach where models learn by solving two problems in one. By training models to look at a problem from two perspectives at the same time, they learn to reason about tasks that require solving similar problems, which is beneficial for generalization. We also explored the ability of neural networks to generalize comparing them to the Chomsky hierarchy of languages. By rigorously testing 2,200 models on 16 different tasks, we discovered that some models struggled to generalize and found that augmenting them with external memory was crucial to improve performance.
Another challenge we face is how progress on longer-term tasks at an expert level, where rewards are rare. We developed a new approach and open source training dataset to help models learn to explore in a human-like manner over long periods of time.
Innovative approaches
As we develop more advanced AI capabilities, we need to ensure that current methods work as intended and effectively in the real world. For example, although linguistic models can produce impressive responses, many cannot explain their responses. We introduce a method of using language models to solve multi-step reasoning problems by exploiting their underlying logical structure, providing explanations that can be understood and verified by humans. On the other hand, adversarial attacks are a way to probe the limits of AI models by pushing them to create erroneous or harmful results. Training on adversarial examples makes models more robust to attacks, but can come at the cost of performance on “regular” inputs. We show that by adding adapters we can create models that allow us to control this trade-off on the fly.
Reinforcement learning (RL) has been shown to be effective for a range of real world challenges, but RL algorithms are generally designed to perform one task well and have difficulty generalizing to new ones. We propose algorithm distillation, a method that allows a single model to generalize efficiently to new tasks by training a transformer to imitate the learning histories of RL algorithms on various tasks. RL models also learn through trial and error, which can be data-intensive and time-consuming. It took almost 80 billion data frames for our model Agent 57 to achieve human-level performance in 57 Atari games. We share a new way of train at this level using 200 times less experienceSignificantly reducing IT and energy costs.
AI for science
AI is a powerful tool for researchers to analyze large amounts of complex data and understand the world around us. Several articles show how AI is accelerating scientific progress – and how science is driving AI forward.
Predicting the properties of a molecule from its 3D structure is essential for drug discovery. We present a denoising method which achieves a new state of the art in molecular property prediction, enables large-scale pretraining, and generalizes to different biological datasets. We are also introducing a new transformer that can perform more precise quantum chemistry calculations using only data on atomic positions.
Finally, with FIGnet, we draw inspiration from physics to model collisions between complex shapes, like a teapot or a donut. This simulator could have applications in the fields of robotics, graphics and mechanical design.
See the full list of DeepMind documents and calendar of events at ICLR 2023.