ChatGPT OpenAI Developer's Approach to Creation artificial intelligence was criticized this week by former employees who accuse the company of taking unnecessary risks with technology that could become dangerous.
Today, OpenAI released a new research paper apparently aimed at showing that it's serious about tackling AI risks by making its models more explainable. In the paper, the company's researchers discovered a way to examine the AI model that powers ChatGPT. They are designing a method to identify how the model stores certain concepts, including those that could cause an AI system to behave poorly.
While the research makes OpenAI's work to control AI more visible, it also highlights recent disruption within the company. The new research was carried out by the recently disbanded “superalignment” team at OpenAI which was dedicated to studying the long-term risks of technology.
The co-leaders of the former group, Ilya Sutskever and Jan Leike, both left OpenAI-are named co-authors. Sutskever, co-founder of OpenAI and former chief scientist, was among the board members who voted to fire CEO Sam Altman last November, setting off a chaotic few days that culminated in Altman's return to party leadership.
ChatGPT is powered by a family of large language models called GPT, based on a machine learning approach known as artificial neural networks. These mathematical networks have shown a great ability to learn useful tasks by analyzing sample data, but their operation cannot be easily scrutinized the way conventional computer programs can. The complex interaction between layers of “neurons” within an artificial neural network makes reverse engineering why a system like ChatGPT came up with a particular answer extremely difficult.
“Unlike most human creations, we do not truly understand the inner workings of neural networks,” the researchers behind the work wrote in an accompanying statement. blog post. Some prominent AI researchers believe that perhaps the most powerful AI models, including ChatGPT, could be used to design chemical or biological weapons and coordinate cyberattacks. A longer-term concern is that AI models might choose to hide information or act in harmful ways in order to achieve their goals.
OpenAI's new paper describes a technique that takes some of the mystery out of it, identifying patterns that represent specific concepts within a machine learning system with the help of an additional machine learning model. The key innovation consists of refining the network used to peer into the system of interest by identifying concepts, in order to make it more efficient.
OpenAI has proven its approach by identifying models that represent concepts within GPT-4, one of its largest AI models. The company published code linked to interpretability work, as well as a visualization tool which can be used to see how words in different sentences activate concepts, including crude and erotic content, in GPT-4 and another model. Knowing how a model represents certain concepts could be a step toward being able to reduce those associated with undesirable behaviors, in order to keep an AI system on track. It could also make it possible to adjust an AI system to favor certain topics or ideas.