As autonomous systems and artificial intelligence become more common in everyday life, new methods are emerging to help humans verify that these systems are behaving as expected. One method, called formal specifications, uses mathematical formulas that can be translated into natural language expressions. Some researchers say this method can be used to explain the decisions an AI will make in a way that humans can interpret.
Researchers at MIT Lincoln Laboratory wanted to verify these interpretability claims. Their results indicate the opposite: formal specifications do not seem interpretable by humans. In the team’s study, participants were asked to test whether an AI agent’s plan would succeed in a virtual game. When presented with the formal specifications of the plan, participants were correct less than half the time.
“These results are bad news for researchers who claim that formal methods confer interpretability to systems. This might be true in a restricted, abstract sense, but not for anything approaching practical system validation.” , says Hosea Siu, researcher at the laboratory AI Technology Group. The one in the group paper was accepted into the 2023 International Conference on Intelligent Robots and Systems held earlier this month.
Interpretability is important because it allows humans to trust a machine when it is used in the real world. If a robot or AI can explain its actions, then humans can decide whether they need adjustments or whether they can trust them to make fair decisions. An interpretable system also allows users of the technology – not just developers – to understand and trust its capabilities. However, interpretability has long been a challenge in the field of AI and autonomy. The machine learning process takes place in a “black box,” so model developers often cannot explain why or how a system made a certain decision.
“When researchers say ‘our machine learning system is accurate,’ we ask ‘how accurate?’ » and “using what data?” » and if this information is not provided, we reject the claim. We haven’t done much when researchers say ‘our machine learning system is interpretable’ and we need to start subjecting these claims to greater scrutiny,” says Siu. .
Lost in translation
For their experiment, the researchers sought to determine whether formal specifications made a system’s behavior more interpretable. They focused on the ability of users to use such specifications to validate a system, that is, to understand whether the system still met the user’s objectives.
The application of formal specifications for this purpose is essentially a by-product of its initial use. Formal specifications are part of a broader set of formal methods that use logical expressions as a mathematical framework to describe the behavior of a model. Because the model is built on a logical flow, engineers can use “model checkers” to mathematically prove facts about the system, including when it is or is not possible for the system to accomplish a task. Today, researchers are attempting to use this same framework as a translational tool for humans.
“Researchers confuse the fact that formal specifications have precise semantics and that they are interpretable by humans. They are not the same thing,” explains Siu. “We realized that almost no one was checking to see if people actually understood the results.”
In the team’s experiment, participants were asked to validate a fairly simple set of behaviors with a robot playing a game of capture the flag, essentially answering the question “If the robot follows these rules exactly, is he always a winner? »
Participants included both experts and non-experts in formal methods. They received the formal specifications in three ways: a “raw” logical formula, the formula translated into words closer to natural language, and a decision tree format. Decision trees in particular are often seen in the AI world as a human-interpretable way to show AI or robot decision-making.
The results: “Validation performance overall was quite poor, with around 45% accuracy regardless of presentation type,” says Siu.
Confidently false
Those previously trained in formal specifications performed only slightly better than novices. However, the experts reported having much more confidence in their answers, whether they were correct or not. Overall, people tended to be overly confident in the accuracy of the specifications presented to them, meaning they ignored the rules allowing gambling losses. According to the researchers, this confirmation bias is of particular concern for system validation because people are more likely to overlook failure modes.
“We don’t think this result means we should abandon formal specifications as a way to explain system behaviors to people. But we think there’s still a lot of work to be done in designing how they are presented to people and in the workflow in which people use them,” adds Siu.
Reflecting on why the results were so poor, Siu acknowledges that even people working on formal methods are not trained enough to verify specifications as experience required them to. And it’s hard to think through all the possible outcomes of a set of rules. Despite this, the rules presented to participants were short, equivalent to a paragraph of text, “much shorter than anything you would encounter in any real system,” Siu says.
The team does not attempt to directly tie its results to the performance of humans when validating robots in the real world. Instead, they aim to use the results as a starting point for examining what the formal logic community may be missing when it claims interpretability, and how such claims may come to fruition in the real world.
This research was conducted as part of a larger project Siu and his teammates are working on to improve the relationship between robots and human operators, particularly those in the military. The robotics programming process can often leave operators behind. With the same goal of improving interpretability and confidence, the project attempts to enable operators to directly teach tasks to robots, in a manner similar to training humans. Such a process could improve both the operator’s trust in the robot and the robot’s adaptability.
Ultimately, they hope that the results of this study and their ongoing research can improve the application of autonomy, as it becomes more integrated into human life and decision-making.
“Our results highlight the need for human evaluations of certain autonomy and AI systems and concepts before too many claims are made about their usefulness in humans,” adds Siu.