Improving metadata quality through crowdsourcing campaigns

WE-Hope aims to create a platform that will offer access of cultural heritage content, namely refugee testimonies and stories, shared by users around the world. Taking part in crowdsourcing initiatives and campaigns, users of the platform can improve metadata quality and enlarge metadata quality substantially and effectively. The new amount of data that has been produced by automated A.I. algorithms thanks to the recent advances in A.I. and Machine Learning can be verified by users of the platform, and can also be enriched by users by adding new terms using international thesauri.



Cultural Heritage (CH) includes the sites, things, and practices a society regards as old, important, and worthy of conservation. It has been the subject of increasing popular and scholarly attention worldwide, and its conceptual scope is expanding. In recent years, the Cultural Heritage sector has seen an incredible transformation: accelerated digital evolution in the form of massive digitisation and annotation activities along with action towards multimodal cultural content generation from all possible sources has resulted in vast amounts of digital content being available through a variety of cultural institutions, such as museums, libraries, archives and galleries. In addition, the evolution of web technologies has contributed in making the Web the core platform for the circulation, distribution and consumption of a broad range of cultural content.

Initiatives aiming to aggregate digital cultural content in national and international level and make it easily available to cultural and creative sectors have appeared during the last decades, amongst which the Europeana and the Digital Public Library of America stand out. They operate as cross-domain hubs, making content accessible to users, readily available for search and study and reuse through creative applications and web services. But although their main strength lies in the vast number of the items they contain, their main weakness is the lack of structured and rich descriptive metadata and/or the insufficient metadata quality. Such a problem highly affects the accessibility, visibility and dissemination range of the available digital content, also limiting the usability and the potential of added-value services and applications that re-use these resources in innovative ways, limiting also the user experience.

Metadata quality improvement and enrichment are major challenges that receive increasing attention in the digital cultural heritage domain. They have been traditionally manual processes facing the problem of scale, since improving or even adding new metadata to hundreds of thousands or even millions of records coming from different sources requires significant investment in time, effort and resources, which cannot usually be afforded by aggregators and cultural heritage institutions. The bottleneck of scale in CH metadata enrichment can be currently surpassed owing to the evolution of Artificial Intelligence (AI) and the rise of crowdsourcing initiatives. The latest advancement in AI and Machine Learning (ML) technologies facilitate the metadata enrichment process by providing capabilities of ingesting and analyzing almost any amount and type of data.


In addition, crowdsourcing initiatives and campaigns, viewed as efforts of harnessing the crowd or the potential of the crowd to solve complex problems at scales and rates that no one individual can, have proved to be a powerful tool to obtain input from the crowd and assist metadata enrichment. In this context, metadata enrichment services through automated analysis and feature extraction along with crowdsourcing annotation services available in a centralized way through a dedicated platform can offer a remarkable opportunity for improving the metadata quality of digital content stored in platforms such as Europeana while at the same time engaging users and raising awareness about cultural heritage assets.


This article was written by NTUA.

Photo by Marvin Meyer on Unsplash.