Building a responsible approach to data collection with the AI Partnership
At DeepMind, our goal is to ensure that everything we do meets the highest security and ethical standards, in accordance with our Functioning principles. One of the most important places this starts is how we collect our data. Over the past 12 months we have collaborated with AI Partnership (PAI) for carefully considering these challenges and having co-developed best practices and standardized processes for responsible human data collection.
Human data collection
More than three years ago, we created our Human Behavior Research Ethics Committee (HuBREC), a governance group modeled after institutional review boards (IRBs), such as those we found in hospitals and universities, with the aim of protecting the dignity, rights and well-being of the human participants involved in our studies. This committee oversees behavioral research involving experiments with humans as the subject of study, such as the study of how humans interact with artificial intelligence (AI) systems in a decision-making process.
Alongside projects involving behavioral research, the AI community is increasingly engaging in efforts involving “data enrichment” – tasks performed by humans to train and validate machine learning models, such as data labeling and model evaluation. While behavioral research often relies on volunteer participants being the subject of the study, data enrichment involves people being paid to complete tasks that improve AI models.
These types of tasks are typically carried out on crowdsourcing platforms, often raising ethical considerations related to compensation, welfare and fairness of workers, which may lack the necessary guidance or governance systems to ensure compliance. sufficient standards. As research laboratories accelerate the development of increasingly sophisticated models, the reliance on data enrichment practices will likely grow and, along with it, the need for more stringent guidance.
As part of our Operating Principles, we are committed to respecting and contributing to best practices in the areas of AI security and ethics, including fairness and privacy, to avoid adverse outcomes unexpected events that create risks of harm.
Good practices
Following the PAI recent white paper on responsible sourcing of data enrichment services, we have collaborated to develop our data enrichment practices and processes. This included creating five steps that AI practitioners can take to improve the working conditions of those involved in data enrichment tasks (for more details, please visit Procurement Guidelines for PAI Data Enrichment):
- Select an appropriate payment model and ensure all workers are paid above the local living wage.
- Design and run a pilot before launching a data enrichment project.
- Identify appropriate workers for the desired task.
- Provide verified instructions and/or training materials for workers to follow.
- Establish clear and regular communication mechanisms with workers.
Together, we created the necessary policies and resources, gathering multiple rounds of feedback from our internal legal, data, security, ethics and research teams, before testing them on a small number of data collection projects. data and deploy them later. the organization in the broad sense.
These documents provide more clarity on how best to configure data enrichment tasks at DeepMind, improving our researchers' confidence in designing and executing studies. This not only increased the efficiency of our approval and release processes, but more importantly improved the experience for those involved in data enrichment tasks.
More information on responsible data enrichment practices and how we have integrated them into our existing processes is explained in PAI's recent case study, Implementing responsible data enrichment practices at an AI developer: the example of DeepMind. PAI also provides Useful resources and supporting materials for AI practitioners and organizations looking to develop similar processes.
Look forward to
While these best practices support our work, we should not rely solely on them to ensure that our projects meet the highest standards for the well-being and safety of research participants or workers. Every project at DeepMind is different, which is why we have a dedicated human data review process that allows us to continually collaborate with research teams to identify and mitigate risks on a case-by-case basis.
This work is intended to serve as a resource for other organizations interested in improving their procurement practices for data enrichment, and we hope it leads to cross-industry conversations that could further develop these guidelines and resources for teams and the partners. Through this collaboration, we also hope to spark broader discussion about how the AI community can continue to develop standards for responsible data collection and collectively build better industry standards.
Learn more about our Functioning principles.