
Best practices for data enrichment
Constructing a accountable strategy to information assortment with the Partnership on AI
At DeepMind, our purpose is to ensure all the things we do meets the very best requirements of security and ethics, according to our Operating Principles. Probably the most essential locations this begins with is how we acquire our information. Up to now 12 months, we’ve collaborated with Partnership on AI (PAI) to fastidiously think about these challenges, and have co-developed standardised greatest practices and processes for accountable human information assortment.
Human information assortment
Over three years in the past, we created our Human Behavioural Analysis Ethics Committee (HuBREC), a governance group modelled on educational institutional evaluate boards (IRBs), resembling these present in hospitals and universities, with the purpose of defending the dignity, rights, and welfare of the human individuals concerned in our research. This committee oversees behavioural analysis involving experiments with people as the topic of research, resembling investigating how people work together with synthetic intelligence (AI) techniques in a decision-making course of.
Alongside tasks involving behavioural analysis, the AI neighborhood has more and more engaged in efforts involving ‘information enrichment’ – duties carried out by people to coach and validate machine studying fashions, like information labelling and mannequin analysis. Whereas behavioural analysis usually depends on voluntary individuals who’re the topic of research, information enrichment includes individuals being paid to finish duties which enhance AI fashions.
Some of these duties are often performed on crowdsourcing platforms, usually elevating moral issues associated to employee pay, welfare, and fairness which may lack the mandatory steerage or governance techniques to make sure ample requirements are met. As analysis labs speed up the event of more and more refined fashions, reliance on information enrichment practices will probably develop and alongside this, the necessity for stronger steerage.

As a part of our Working Ideas, we decide to upholding and contributing to greatest practices within the fields of AI security and ethics, together with equity and privateness, to keep away from unintended outcomes that create dangers of hurt.
One of the best practices
Following PAI’s recent white paper on Accountable Sourcing of Knowledge Enrichment Companies, we collaborated to develop our practices and processes for information enrichment. This included the creation of 5 steps AI practitioners can observe to enhance the working situations for individuals concerned in information enrichment duties (for extra particulars, please go to PAI’s Data Enrichment Sourcing Guidelines):
- Choose an acceptable cost mannequin and guarantee all staff are paid above the native dwelling wage.
- Design and run a pilot earlier than launching an information enrichment challenge.
- Establish acceptable staff for the specified job.
- Present verified directions and/or coaching supplies for staff to observe.
- Set up clear and common communication mechanisms with staff.
Collectively, we created the mandatory insurance policies and assets, gathering a number of rounds of suggestions from our inside authorized, information, safety, ethics, and analysis groups within the course of, earlier than piloting them on a small variety of information assortment tasks and later rolling them out to the broader organisation.
These paperwork present extra readability round how greatest to arrange information enrichment duties at DeepMind, bettering our researchers’ confidence in research design and execution. This has not solely elevated the effectivity of our approval and launch processes, however, importantly, has enhanced the expertise of the individuals concerned in information enrichment duties.
Additional data on accountable information enrichment practices and the way we’ve embedded them into our present processes is defined in PAI’s latest case research, Implementing Responsible Data Enrichment Practices at an AI Developer: The Example of DeepMind. PAI additionally supplies helpful resources and supporting materials for AI practitioners and organisations searching for to develop related processes.
Wanting ahead
Whereas these greatest practices underpin our work, we shouldn’t depend on them alone to make sure our tasks meet the very best requirements of participant or employee welfare and security in analysis. Every challenge at DeepMind is completely different, which is why now we have a devoted human information evaluate course of that permits us to repeatedly interact with analysis groups to establish and mitigate dangers on a case-by-case foundation.
This work goals to function a useful resource for different organisations keen on bettering their information enrichment sourcing practices, and we hope that this results in cross-sector conversations which may additional develop these tips and assets for groups and companions. By this collaboration we additionally hope to spark broader dialogue about how the AI neighborhood can proceed to develop norms of accountable information assortment and collectively construct higher trade requirements.
Learn extra about our Operating Principles.