HIPE-2022: Shared Task on Named Entity Recognition and Linking in Multilingual Historical Documents

Event details

Date	15.02.2022 › 22.04.2022
Location	Online
Category	Miscellaneous
Event Language	English

The HIPE evaluation lab series is organized by researchers from the EPFL Digital Humanities Lab (DHLAB), the University of Lausanne, the University of Zurich, and the University of La Rochelle. The organizers are inviting participation in the second edition of the HIPE shared task on named entity processing in historical documents as part of CLEF 2022 Evaluation Labs.

The HIPE evaluation lab series is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain and document type heterogeneity, input noisiness, dynamics of language, and lack of resources.

Following the first CLEF-HIPE-2020 evaluation lab on historical newspapers in three languages, HIPE-2022 confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. The objective is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets.

Tasks

- Named Entity Recognition and Classification (coarse and fine-grained tasks).

- Entity Linking (with and without prior information on entity mentions).

Data

HIPE-2022 datasets are based on six primary datasets assembled and prepared for the shared task. Primary datasets are composed of historical newspapers and classic commentaries covering ca. 200 years. They feature several languages as well as different entity tag sets and annotation schemes. They originate from several European cultural heritage projects, from HIPE organizers’ previous research project, and from the previous HIPE-2020 campaign. Some are already published, others are released for the first time for HIPE-2022.

Tracks and Challenges

To accommodate the different dimensions that characterize HIPE-2022 (tasks, languages, document types, entity tag sets) and foster research on transferability, the evaluation lab is organized around challenges and tracks.

A track is a specific triple composed of the test sets of [dataset-language-task] and a challenge is a predefined set of tracks (a challenge can be seen as a kind of championship with multiple tracks).

HIPE-2022 specifically evaluates 3 challenges:

1. Multilingual Newspaper Challenge: newspaper datasets only, 2 languages min;

2. Multilingual Classical Commentary Challenge: commentary datasets only, 3 languages min;

3. Global Adaptation Challenge: submitted tracks must include both document types, 2 languages min.

Practical information

HIPE-2022 website: https://hipe-eval.github.io/HIPE-2022/
Registration: https://clef2022-labs-registration.dei.unipd.it/ (until 22 April 2022)

Participation Guidelines: https://doi.org/10.5281/zenodo.6045662

HIPE-2022-data GitHub repository: https://github.com/hipe-eval/HIPE-2022-data

Workshop venue: during CLEF conference, 5-8 September 2022, Bologna, Italy.
Twitter: #HIPE2022 / @clef_initiative / #clef2022