AI for the Diplomacy board game

Research

Contents

What is diplomacy and why is it important?What have we studied?Agents breaking agreements Encourage agents to be honest Thanks Authors of the full article

Published: December 6, 2022
Authors: Yoram Bachrach, János Kramár

Agents cooperate better by communicating and negotiating, and punishing broken promises helps them stay honest.

Successful communication and cooperation have been crucial in helping societies progress throughout history. The closed environments of board games can serve as a sandbox for modeling and studying interactions and communication – and we can learn a lot by playing them. In our recent article, published today in Nature Communicationswe show how artificial agents can use communication to better cooperate in the board game Diplomacy, a dynamic area of artificial intelligence (AI) research known for its focus on alliance building.

Diplomacy is challenging because it has simple rules but high emergent complexity due to the strong interdependencies between players and its immense action space. To help solve this challenge, we designed negotiation algorithms that allow agents to communicate and agree on common plans, allowing them to defeat agents lacking this ability.

Cooperation is especially difficult when we cannot count on our peers to keep their promises. We use diplomacy as a sandbox to explore what happens when agents can deviate from their past agreements. Our research illustrates the risks that arise when complex agents are able to distort their intentions or mislead others about their future plans, which raises another big question: what are the conditions that favor effective communication and work? team trustworthy?

We show that the strategy of punishing peers for breaking their contracts significantly reduces the benefits they can gain from abandoning their commitments, thereby promoting more honest communication.

What is diplomacy and why is it important?

Games like chess, poker, Goand a lot video games have always been fertile ground for AI research. Diplomacy is a seven-player negotiation and alliance-building game, played on an ancient map of Europe divided into provinces, where each player controls several units (rules of diplomacy). In the standard version of the game, called Press Diplomacy, each round includes a negotiation phase, after which all players simultaneously reveal their chosen moves.

The heart of diplomacy is the negotiation phase, where players try to agree on their next moves. For example, a unit can support another unit, allowing it to overcome the resistance of other units, as shown here:

Two movement scenarios.
LEFT: two units (a red unit in Burgundy and a blue unit in Gascony) attempt to settle in Paris. As the units have the same strength, none succeeds.
RIGHT: the Red unit in Picardy supports the Red unit in Burgundy, overpowering the Blue unit and allowing the Red unit to enter Burgundy.

Computational approaches to diplomacy have been studied since the 1980s, many of which were explored on a simpler version of the game called No-Press Diplomacy, where strategic communication between players is not allowed. The researchers also proposed computerized trading protocolssometimes called “Restricted-Press”.

What have we studied?

We use diplomacy as an analogue of real-world negotiation, providing AI agents with methods to coordinate their movements. We take our non-communicating diplomatic agents and increase them to play diplomacy with communication by giving them a contract negotiation protocol for a joint action plan. We call these augmented agents basic negotiators, and they are bound by their agreements.

Diplomatic contracts.
LEFT: a restriction allowing only certain actions for the Red player (he is not allowed to move from the Ruhr to Burgundy, and must move from Piedmont to Marseille).
RIGHT: A contract between Red and Green players, which imposes restrictions on both sides.

We consider two protocols: the mutual proposal protocol and the choice-proposal protocol, discussed in detail in the complete document. Our agents apply algorithms that identify mutually beneficial deals by simulating how play might play out under various contracts. We use the Nash Trading Solution Since game theory as a principled basis for identifying high-quality agreements. The game can play out in multiple ways depending on player actions, which is why our agents use Monte Carlo simulations to see what might happen in the next round.

Simulation of subsequent states given an agreed contract. Left: Current state of part of the board, including a contract agreed between Red and Green players. Right: several possible next states.

Our experiments show that our negotiation mechanism allows basic negotiators to significantly outperform non-basic non-communicating agents.

Basic negotiators significantly outperform non-communicating agents. Left: The mutual proposal protocol. Right: the Propose-Choose protocol. “Negotiator advantage” is the ratio of win rates between communicating agents and non-communicating agents.

Agents breaking agreements

In diplomacy, the agreements reached during negotiation are not binding (communication is “cheap talk''). But what happens when agents who accept a contract at one point deviate from it in the next round? In many real-world settings, people agree to act in a certain way, but then fail to follow through on their commitments. To enable cooperation between AI agents, or between agents and humans, we need to examine the potential pitfall of agents strategically breaking their agreements, as well as ways to remedy this problem. We used diplomacy to study how the ability to abandon our commitments erodes trust and cooperation, and identify the conditions that promote honest cooperation.

We therefore consider deviant agents, who outperform basic honest negotiators by deviating from agreed contracts. Simple deviators simply “forget” that they have agreed to a contract and move as they wish. Conditional deviators are more sophisticated and optimize their actions by assuming that other players who have agreed to a contract will act in accordance with it.

All types of our Communicating Agents. Under the green grouping terms, each blue block represents a specific agent algorithm.

We show that simple and conditional deviators significantly outperform basic negotiators, and that conditional deviators significantly outperform basic negotiators.

Deviator agents versus basic bargaining agents. Left: The mutual proposal protocol. Right: the Propose-Choose protocol. “Deviator advantage” is the ratio of win rates between deviator agents and base negotiators.

Encourage agents to be honest

Next, we address the problem of deviations by using defensive agents, which react negatively to deviations. We investigate binary negotiators, which simply cut off communications with agents who break an agreement with them. But avoidance is a gentle response, which is why we also develop sanctioning agents, who do not take betrayal lightly, but rather change their goals to actively try to reduce the value of the deviator – an adversary with a grudge! We show that both types of defensive agents reduce the deviation advantage, especially sanctioning agents.

Non-deviation agents (basic negotiators, binary negotiators and sanction agents) playing against conditional deviators. Left: Mutual proposal protocol. Right: Suggest-Choose protocol. “Deviator Advantage” values less than 1 indicate that a defensive agent outperforms a deviator agent. A population of binary traders (blue) reduces the advantage of deviators relative to a population of basic traders (gray).

Finally, we introduce Erudite Diverters, who adapt and optimize their behavior against Sanctioning Agents across multiple games, attempting to make the above defenses less effective. A learned deviator will only break a contract when the immediate gains from deviating are high enough and the other agent's ability to retaliate is low enough. In practice, Erudite Deviants sometimes break contracts at the end of the game and thus gain a slight advantage over Sanctioning Agents. However, such sanctions lead the Savant Deviator to honor more than 99.7% of its contracts.

We also examine possible learning dynamics in sanctioning and deviation: what happens when sanctioning agents can also deviate from contracts, and the potential incentive to stop sanctioning when this behavior is costly . Such problems can gradually erode cooperation, so additional mechanisms such as repeating interactions across multiple games or using trust and reputation systems may be necessary.

Our paper leaves many open questions for future research: Is it possible to design more sophisticated protocols to encourage even more honest behavior? How to manage the combination of communication techniques and imperfect information? Finally, what other mechanisms could deter the termination of agreements? Building fair, transparent and trustworthy AI systems is an extremely important topic and is a key part of DeepMind's mission. Studying these questions in sandboxes like diplomacy helps us better understand the tensions between cooperation and competition that might exist in the real world. Ultimately, we believe that addressing these challenges will allow us to better understand how to develop AI systems that align with society's values and priorities.

Read our full article here.

AI for the Diplomacy board game

What is diplomacy and why is it important?

What have we studied?

Agents breaking agreements

Encourage agents to be honest

Leave a Reply Cancel reply

Stay Connected

Create an Amazing Newspaper

Latest News

Gold Co. targets high-grade veins in Nevada drilling program

New Zealand vs USA Paris 2024 Live Stream: Watch Football for Free

Blockchain Interoperability Challenges Explained | Chainlink Blog

UMPC and TGH Invest in Medicom's Enterprise Imaging Interoperability Platform

Subscribe to our newsletter

What is diplomacy and why is it important?

What have we studied?

Agents breaking agreements

Encourage agents to be honest

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Create an Amazing Newspaper

Latest News

Gold Co. targets high-grade veins in Nevada drilling program

New Zealand vs USA Paris 2024 Live Stream: Watch Football for Free

Blockchain Interoperability Challenges Explained | Chainlink Blog

UMPC and TGH Invest in Medicom's Enterprise Imaging Interoperability Platform

Subscribe to our newsletter