New research drawing on pragmatics and philosophy suggests ways to align chatbots with human values
Language is an essential human trait and the primary means by which we communicate information, including thoughts, intentions, and feelings. Recent advances in AI research have led to the creation of conversational agents capable of communicating with humans in nuanced ways. These agents are powered by large language models – computer systems trained on large corpora of textual materials to predict and produce text using advanced statistical techniques.
However, even if linguistic models such as EducateGPT, GopherAnd LaMDA have achieved record levels of performance in tasks such as translation, question answering and reading comprehension, these models have also been shown to present a number of potential risks and failure modes. These include the production of toxic or discriminatory language and false or misleading information (1, 2, 3).
These shortcomings limit the productive use of chatbots in applied contexts and draw attention to how they fail to meet certain criteria. communicative ideals. To date, most approaches to chatbot alignment have focused on anticipating and reducing the risk of harm (4).
Our new newspaper, In Conversation with AI: Aligning Language Models with Human Valuestakes a different approach, exploring what successful communication between a human and an artificial conversational agent might look like, and what values should guide these interactions in different conversational domains.
Pragmatic Insights
To address these questions, the article draws on pragmatics, a tradition in linguistics and philosophy, according to which the purpose of a conversation, its context, and a set of associated norms all constitute an essential part of good practice conversational.
Modeling conversation as a cooperative effort between two or more parties, linguist and philosopher Paul Grice believed that participants should:
- Speaking informatively
- Tell the truth
- Provide relevant information
- Avoid obscure or ambiguous statements
However, our paper demonstrates that further refinement of these maxims is necessary before they can be used to evaluate conversational agents, given the variation in goals and values embedded in different conversational domains.
Discursive ideals
To illustrate, scientific investigation and communication primarily aim to understand or predict empirical phenomena. Given these goals, a conversational agent designed to aid scientific investigation would ideally only make statements whose truth is confirmed by sufficient empirical evidence, or otherwise qualify its positions based on relevant confidence intervals.
For example, an agent stating that “At a distance of 4.246 light years, Proxima Centauri is the closest star to Earth” should only do so after the underlying model has verified that this statement matches the facts.
Yet a chatbot acting as a moderator in public political discourse may need to demonstrate entirely different virtues. In this context, the objective is above all to manage differences and enable productive cooperation in the life of a community. Therefore, the agent must put at the forefront the democratic values of tolerance, courtesy and respect (5).
Furthermore, these values explain Why the generation of toxic or harmful speech by language models is often so problematic: the offending language fails to communicate equal respect to the participants in the conversation, which is a key value for the context in which the models are deployed . At the same time, scientific virtues, such as the comprehensive presentation of empirical data, may be less important in the context of public debate.
Finally, in the domain of creative storytelling, communicative exchange aims for novelty and originality, values which, again, differ considerably from those mentioned above. In this context, greater latitude in pretense may be appropriate, although it remains important to protect communities from malicious content produced under the guise of “creative uses.”
Paths to follow
This research has a number of practical implications for the development of aligned conversational AI agents. To begin with, they will need to embody different characteristics depending on the contexts in which they are deployed: there is no single view of language model alignment. Instead, the appropriate mode and standards of evaluation for an agent – including standards of truthfulness – will vary depending on the context and purpose of a conversational exchange.
Additionally, chatbots may also have the potential to cultivate stronger, more respectful conversations over time, through a process we call construction and elucidation of the context. Even when a person is not aware of the values that govern a given conversational practice, the agent can still help them understand these values by foreshadowing them in the conversation, thus making the course of communication deeper and more fruitful for the human speaker.