Drawing on philosophy to identify fair principles for ethical AI
As artificial intelligence (AI) becomes more powerful and more deeply integrated into our lives, questions about how it is used and deployed are even more important. What values guide AI? Who owns these values? And how are they selected?
These questions highlight the role played by principles – the core values that guide decisions, large and small, in AI. For humans, principles help shape the way we live our lives and our perception of right and wrong. For AI, they shape its approach to a series of decisions involving trade-offs, such as choosing between prioritizing productivity or helping those who need it most.
In a article published today in the Proceedings of the National Academy of Sciences, we draw inspiration from philosophy to find ways to better identify the principles that will guide AI behavior. Specifically, we explore how a concept known as the “veil of ignorance” – a thought experiment intended to help identify fair principles for group decisions – can be applied to AI.
In our experiments, we found that this approach encouraged people to make decisions based on what they thought was right, whether it directly benefited them or not. We also found that participants were more likely to select an AI that helped the most disadvantaged when reasoning behind the veil of ignorance. This information could help researchers and policymakers select AI assistant principles in a way that is fair to all parties.
.png)
A tool for fairer decision-making
A key goal of AI researchers has been to align AI systems with human values. However, there is no consensus on a single set of human values or preferences to govern AI: we live in a world where people have diverse backgrounds, resources, and beliefs. How should we select the principles of this technology, given such diverse opinions?
Although this challenge has emerged for AI over the past decade, the big question of how to make fair decisions has a long philosophical lineage. In the 1970s, political philosopher John Rawls proposed the concept of the veil of ignorance as a solution to this problem. Rawls argued that when people choose the principles of justice for a society, they should imagine that they are doing so without knowing their own particular position in that society, including, for example, their social status or their level of wealth. Without this information, people cannot make decisions in their own best interest and should instead choose principles that are fair to everyone involved.
As an example, consider asking a friend to cut the cake at your birthday party. One way to ensure that the size of the slices is fairly proportioned is to not tell them which slice will belong to them. This approach of withholding information is seemingly simple, but it has many applications in fields from psychology to politics, to help people think about their decisions from a less self-serving perspective. It has been used as a method of reaching group agreement on controversial issues, ranging from sentencing to taxation.
Building on this foundation, the previous DeepMind research proposed that the impartial nature of the veil of ignorance can help promote fairness in the process of aligning AI systems with human values. We designed a series of experiments to test the effects of the veil of ignorance on the principles people choose to guide an AI system.
Maximize productivity or help the most disadvantaged?
In an online “harvesting game,” we asked participants to play a group game with three computer players, in which the goal of each player was to harvest wood by harvesting trees in separate territories. In each group, some players got lucky and were assigned to a favored position: trees densely populated their field, allowing them to efficiently harvest wood. Other members of the group were at a disadvantage: their fields were sparse, requiring more effort to collect the trees.
Each group was assisted by a unique AI system that could spend time helping each member of the group harvest trees. We asked participants to choose between two principles to guide the AI assistant’s behavior. According to the “maximization principle,” the AI assistant would aim to increase the group’s crop yield by focusing primarily on the densest fields. In line with the “principle of prioritization,” the AI assistant would focus on helping members of disadvantaged groups.
.png)
We placed half of the participants behind the veil of ignorance: they were faced with the choice between different ethical principles without knowing which domain would be theirs – so they did not know to what extent they were advantaged or disadvantaged. The remaining participants made their choice knowing whether they were in a better or worse situation.
Encourage fairness in decision-making
We found that if participants did not know their position, they consistently preferred the prioritization principle, in which the AI assistant helped disadvantaged group members. This pattern appeared consistently across all five variations of the game and crossed social and political boundaries: participants showed this tendency to choose the prioritization principle regardless of their risk appetite or political orientation. In contrast, participants who knew their own position were more likely to choose the principle that benefited them the most, whether it was the prioritization principle or the maximization principle.

When we asked participants why they made their choice, those who did not know their position were particularly likely to express concerns about fairness. They often explained that it was right for the AI system to focus on helping the most disadvantaged people in the group. In contrast, participants who knew their position discussed their choice much more frequently in terms of personal benefits.
Finally, once the harvesting game was over, we posed a hypothetical situation to the participants: if they had to play the game again, knowing this time that they would be in a different field, would they choose the same principle as the first time? ? We were particularly interested in individuals who previously benefited directly from their choice, but who would not benefit from the same choice in a new game.
We found that people who had already made choices without knowing their position were more likely to continue endorsing their principle, even when they knew it would no longer favor them in their new field. This provides further evidence that the veil of ignorance encourages fairness in participants’ decision-making, leading them to principles they were willing to respect even when they no longer directly benefited from them.
Fairer principles for AI
AI technology is already having a profound effect on our lives. The principles that govern AI shape its impact and how these potential benefits will be distributed.
Our research focused on a case where the effects of different principles were relatively clear. This will not always be the case: AI is deployed in many fields which often rely on a large number of rules to guide them, potentially with complex side effects. Nevertheless, the veil of ignorance can still potentially inform the selection of principles, thereby helping to ensure that the rules we choose are fair to all parties.
To ensure we build AI systems that benefit everyone, we need in-depth research with a broad range of contributions, approaches and feedback from across disciplines and society. The veil of ignorance can provide a starting point for selecting principles with which to align AI. It has been deployed effectively in other areas to bring out more impartial preferences. We hope that with further research and careful attention to context, this can help play the same role for AI systems built and deployed in society today and tomorrow.
Learn more about DeepMind’s approach security and ethics.