Descriptions should be written as one or more proper sentences, starting with a capital letter and ending with a full stop, exclamation mark, or question mark.

The Co-director of City’s Centre for Human-Computer Interaction Design (HCID) chairs a webinar coinciding with the publication of the Aapti Institute report, titled, Just and Equitable Data Labelling, resulting from a study of data labelling startups in India.

By Mr John Stevenson (Senior Communications Officer), Published


Dr Alex Taylor, Co-director of City’s Centre for Human-Computer Interaction Design (HCID), is working with the Bangalore-based Aapti Institute to push for responsible growth and the protection of workers employed in the global and expanding data labelling industry.

Data labelling refers to the identification and provision of metadata descriptions for raw data (images, text files, videos, etc.) so that a machine learning model can learn from such data.

596521Together with the Aapti Institute, Dr Alex Taylor, co-authored a read-back, AI Data Labelling: Shaping Just and Equitable Artificial Intelligence. This preliminary report provides a context for the labour of data labelling, focusing specifically on the practices underpinning the industry and the scope for worker rights to be improved and upheld.

On February 25th, Dr Taylor chaired the webinar, Just and Equitable Data Labelling: Towards a responsible AI Supply Chain with pannelists, Aditi Surie (Indian Institute for Human Settlements); Preeti Syal (Indian government policy think tank NITI Aayog); Smita Malipatil (Chief Empowerment Officer, IndiVillage); and Nusrat Khan (United Nations Development Programme).

The panellists made the point that data labelling outsourced to people living largely in the global south underlies many of the AI-based products and services used by consumers and in professional settings.

Dr Taylor said:

As an industry, data labelling is rapidly evolving and involves major players such as Amazon (Amazon Mechanical Turk) and an increasing number of startups. New business models are emerging to respond to the changing nature of what is being labelled and the advances in the technologies. Some startups provide platforms for clients that give them access to the labellers / crowd work, others are adopting a software-as-a-service model, providing the software necessary to support crowd work, and others are using a hybrid of the two.

A small but prominent business model is impact sourcing, where companies are intentionally outsourcing to economically disadvantaged regions or groups to improve equity and standards of living.

However, the pannelists sought to raise issues about if and how the industry is regulated, as there is considerable potential for exploitation and harm. Much like the gig economy, labourers are at risk because of poor labour rights, conditions of service, etc.

They also spoke to the ethical side of data labelling such as what conditions people are being asked to work in and what materials they are being subjected to, especially where the materials to be labelled are personally sensitive or may include graphic imagery, language, etc.

While some data labelling jobs are beginning to require increased expertise and thus training, such as labelling medical scans and X-rays, these kinds of opportunities need to be set against the urge to automate aspects of work and the kinds of biases, risks and inequities that accompany automation.

The Webinar coincided with the publication of a further report, Just and Equitable Data Labelling, that presents results from a study of data labelling startups in India. This report summarises the ongoing changes in the industry and highlights the opportunities for just and equitable data labelling practices and policies.

Dr Taylor’s work with the Aapti Institute has been supported by City’s Global Challenges Research Fund (GCRF) block grant.