Towards more multimodal, robust and general AI systems
Next week marks the start of the 37th annual Neural Information Processing Systems (NeurIPS) conference, the largest artificial intelligence (AI) conference in the world. NeurIPS 2023 will take place from December 10 to 16 in New Orleans, United States.
Google DeepMind teams present more than 180 papers at the main conference and workshops.
We will present demos of our cutting-edge AI models to global weather forecast, discovery of materialsAnd AI-generated content watermark. There will also be the opportunity to hear from the team behind Gemini, our largest and most capable AI modeL.
Here’s a look at some of our research highlights:
Multimodality: language, video, action
Generative AI models can create paintings, compose music and write stories. But no matter how good these models are in one medium, most struggle to transfer those skills to another. We examine how generative abilities might help learning across modalities. In a flagship presentation, we show that diffusion models can be used to classify images with no additional training required. Diffusion models like Imagen classify images in a more human-like way than other models, relying on shapes rather than textures. Furthermore, we show how predicting image captions can improve computer vision learning. Our approach outperformed current methods on vision and language tasks, and showed greater potential for scalability.
More multimodal models could give way to more useful digital and robotic assistants to help people in their daily lives. In a flagship poster, we create agents capable of interacting with the digital world as humans do – via screenshots and keyboard and mouse actions. Separately, we show that by By leveraging video generation, including captions and closed captions, models can transfer knowledge by predicting video shots for real robot actions.
One of the next steps could be to generate a realistic experience in response to actions performed by humans, robots and other types of interactive agents. We will present a demo of UniSim, our universal simulator for real-world interactions. This type of technology could have applications in many industries, from video games and cinema to training agents for the real world.
Building safe and understandable AI
Large language models can generate impressive responses, but are prone to “hallucinations,” that is, texts that appear correct but are made up. Our researchers are asking the question of whether a method allowing us to find a storage location for a fact (location) can make it possible to modify the fact. Surprisingly, they discovered that locating a fact and changing the location does not change the fact, alluding to the complexity of understanding and controlling the information stored in LLMs. With Tracr, we propose a new way to evaluate interpretability methods by translating human-readable programs into transformer models. We have open source version of Tracr to help serve as ground truth for evaluating interpretability methods.
When developing and deploying large models, privacy must be integrated into every step of the process. For training, our teams study how to measure whether language models store data – in order to protect private and sensitive material. At the same time, our researchers demonstrate how to assess privacy train with effective technique sufficient for real-world use. In another oral presentation, our scientists study the limits of training through the “student” and “teacher” models who have different levels of access and vulnerability in the event of an attack.
As large models become more capable, our research pushes the boundaries of new capabilities to develop more general AI systems.
Although language models are used for general tasks, they lack the exploratory and contextual understanding needed to solve more complex problems. We introduce Thought Tree, a new language model inference framework to help models explore and reason about a wide range of possible solutions. By organizing reasoning and planning as a tree instead of the commonly used flat chain of thought, we demonstrate that a language model is capable of solving complex tasks like the “24 game” with much more precision.
To help people solve problems and find what they’re looking for, AI models must efficiently process billions of unique values. With feature multiplexing, a single representation space is used for many different features, allowing large integration models (LEMs) to scale into products for billions of users.
Finally, with DoReMi, we show how using AI to automate the mixing of training data types can significantly accelerate language model training and improve performance on new and unseen tasks.
Fostering a global AI community
We are proud to sponsor NeurIPS and support workshops led by LatinX in AI, QueerInAIAnd Women in ML, helping to foster research collaborations and grow a diverse AI and machine learning community. This year, NeurIPS will offer a creative track featuring our Visualizing AI project, which asks artists to create more diverse and accessible representations of AI.
If you’re attending NeurIPS, come to our booth to learn more about our cutting-edge research and meet our teams who are hosting workshops and giving presentations throughout the conference.