Agents cooperate better by communicating and negotiating, and punishing broken promises helps them stay honest.
Successful communication and cooperation have been crucial in helping societies progress throughout history. The closed environments of board games can serve as a sandbox for modeling and studying interactions and communication – and we can learn a lot by playing them. In our recent article, published today in Nature Communicationswe show how artificial agents can use communication to better cooperate in the board game Diplomacy, a dynamic area of artificial intelligence (AI) research known for its focus on alliance building.
Diplomacy is challenging because it has simple rules but high emergent complexity due to the strong interdependencies between players and its immense action space. To help solve this challenge, we designed negotiation algorithms that allow agents to communicate and agree on common plans, allowing them to defeat agents lacking this ability.
Cooperation is especially difficult when we cannot count on our peers to keep their promises. We use diplomacy as a sandbox to explore what happens when agents can deviate from their past agreements. Our research illustrates the risks that arise when complex agents are able to distort their intentions or mislead others about their future plans, which raises another big question: what are the conditions that favor effective communication and work? team trustworthy?
We show that the strategy of punishing peers for breaking their contracts significantly reduces the benefits they can gain from abandoning their commitments, thereby promoting more honest communication.
What is diplomacy and why is it important?
Games like chess, poker, Goand a lot video games have always been fertile ground for AI research. Diplomacy is a seven-player negotiation and alliance-building game, played on an ancient map of Europe divided into provinces, where each player controls several units (rules of diplomacy). In the standard version of the game, called Press Diplomacy, each round includes a negotiation phase, after which all players simultaneously reveal their chosen moves.
The heart of diplomacy is the negotiation phase, where players try to agree on their next moves. For example, a unit can support another unit, allowing it to overcome the resistance of other units, as shown here:
Computational approaches to diplomacy have been studied since the 1980s, many of which were explored on a simpler version of the game called No-Press Diplomacy, where strategic communication between players is not allowed. The researchers also proposed computerized trading protocolssometimes called “Restricted-Press”.
What have we studied?
We use diplomacy as an analogue of real-world negotiation, providing AI agents with methods to coordinate their movements. We take our non-communicating diplomatic agents and increase them to play diplomacy with communication by giving them a contract negotiation protocol for a joint action plan. We call these augmented agents basic negotiators, and they are bound by their agreements.
We consider two protocols: the mutual proposal protocol and the choice-proposal protocol, discussed in detail in the complete document. Our agents apply algorithms that identify mutually beneficial deals by simulating how play might play out under various contracts. We use the Nash Trading Solution Since game theory as a principled basis for identifying high-quality agreements. The game can play out in multiple ways depending on player actions, which is why our agents use Monte Carlo simulations to see what might happen in the next round.
Our experiments show that our negotiation mechanism allows basic negotiators to significantly outperform non-basic non-communicating agents.
Agents breaking agreements
In diplomacy, the agreements reached during negotiation are not binding (communication is “cheap talk''). But what happens when agents who accept a contract at one point deviate from it in the next round? In many real-world settings, people agree to act in a certain way, but then fail to follow through on their commitments. To enable cooperation between AI agents, or between agents and humans, we need to examine the potential pitfall of agents strategically breaking their agreements, as well as ways to remedy this problem. We used diplomacy to study how the ability to abandon our commitments erodes trust and cooperation, and identify the conditions that promote honest cooperation.
We therefore consider deviant agents, who outperform basic honest negotiators by deviating from agreed contracts. Simple deviators simply “forget” that they have agreed to a contract and move as they wish. Conditional deviators are more sophisticated and optimize their actions by assuming that other players who have agreed to a contract will act in accordance with it.
We show that simple and conditional deviators significantly outperform basic negotiators, and that conditional deviators significantly outperform basic negotiators.
Encourage agents to be honest
Next, we address the problem of deviations by using defensive agents, which react negatively to deviations. We investigate binary negotiators, which simply cut off communications with agents who break an agreement with them. But avoidance is a gentle response, which is why we also develop sanctioning agents, who do not take betrayal lightly, but rather change their goals to actively try to reduce the value of the deviator – an adversary with a grudge! We show that both types of defensive agents reduce the deviation advantage, especially sanctioning agents.
Finally, we introduce Erudite Diverters, who adapt and optimize their behavior against Sanctioning Agents across multiple games, attempting to make the above defenses less effective. A learned deviator will only break a contract when the immediate gains from deviating are high enough and the other agent's ability to retaliate is low enough. In practice, Erudite Deviants sometimes break contracts at the end of the game and thus gain a slight advantage over Sanctioning Agents. However, such sanctions lead the Savant Deviator to honor more than 99.7% of its contracts.
We also examine possible learning dynamics in sanctioning and deviation: what happens when sanctioning agents can also deviate from contracts, and the potential incentive to stop sanctioning when this behavior is costly . Such problems can gradually erode cooperation, so additional mechanisms such as repeating interactions across multiple games or using trust and reputation systems may be necessary.
Our paper leaves many open questions for future research: Is it possible to design more sophisticated protocols to encourage even more honest behavior? How to manage the combination of communication techniques and imperfect information? Finally, what other mechanisms could deter the termination of agreements? Building fair, transparent and trustworthy AI systems is an extremely important topic and is a key part of DeepMind's mission. Studying these questions in sandboxes like diplomacy helps us better understand the tensions between cooperation and competition that might exist in the real world. Ultimately, we believe that addressing these challenges will allow us to better understand how to develop AI systems that align with society's values and priorities.
Read our full article here.