BSNLP-2017 Shared Task on multi-lingual named entity recognition for Slavic languages

We are happy to announce the BSNLP 2017 Shared Task:

Multilingual named entity recognition for Slavic languages

Shared task Website: http://bsnlp-2017.cs.helsinki.fi/shared_task.html
Sponsored by ACL SIGSlav

The ACL Special Interest Group on Slavic NLP invites participation in the Shared Task on multilingual named entity recognition for Slavic languages. Results of the shared task will be presented at the BSNLP-2017 Workshop, to be held at EACL-2017 in Valencia Spain, on 4 April, 2017.

The task aims at recognizing mentions of named entities in web documents in Slavic languages, their normalization / lemmatization, and cross-language matching. Due to rich inflection, free word order, derivation and other phenomena exhibited by Slavic languages, the detection of names and their lemmatization poses a challenging task.
Fostering research and development on this problem—and the closely related problem of entity linking—is of paramount importance for enabling multilingual and cross-lingual information access.

The shared task initially covers seven languages:
— Croatian,
— Czech,
— Polish,
— Russian,
— Slovak,
— Slovene,
— Ukrainian
and focuses on recognition of four types of named entities including:
— persons,
— locations,
— organizations, and
— miscellaneous,
where the last category covers mentions of all other types of named entities, e.g.,
products, events, etc. This is the first edition of the task, and it is intended to be
expanded to additional entity types and languages in the future.

The task focuses on cross-lingual document-level extraction of named entities, i.e., the
systems should recognize, classify, and extract all named entity mentions in a document, but detecting the position of each named entity mention in text is not required.


Data

The input text collection consists of sets of documents from the Web, each collection
revolving around a certain entity. The corpus was obtained by posing a query to a search
engine and parsing the HTML of relevant documents.
The training data consists of two sets of about 200 documents each.

Registered participants will receive the full corpora and further information via email
directly after registration.

The test data set will be provided to registered participants in February and will be in the
same format, i.e., the content of each collection will be focused on one particular entity. Please see the Section on Important Dates for further information.
The format used will be exactly the same as for training data.

Detailed information about data formats, rules for entity types, system response guidelines, evaluation metrics and procedure, publication of results, and on-going updates about the Shared Task will be announced on BSNLP 2017 web page at:
http://bsnlp-2017.cs.helsinki.fi/shared_task.html
and on the mailing list of SIGSLAV at:
https://groups.google.com/forum/?fromgroups#!forum/sigslav

Timeline

12 December 2016

Shared task announcement and release of training/trial data

12 December 2016

First Call for Participation

21 December 2016

Second Call for Participation

10 January 2017

Final Call for Participation

16 January 2017

Deadline for submission of system papers (not mandatory)

11 February 2017

Release of blind test data for registered participants

12 February 2017

Notification of acceptance of system papers

13 February 2017

Announcement of the results of the evaluation to participants

21 February 2017

Camera-ready system papers due

(including the received results of the evaluation)

4 April 2017

BSNLP 2017 workshop

 

Об авторе Лидия Пивоварова

СПбГУ - старший преподаватель, University of Helsinki - PhD student http://philarts.spbu.ru/structure/sub-faculties/itah_phil/teachers/pivovarova
Запись опубликована в рубрике Конференции, Ресурсы/Софт. Добавьте в закладки постоянную ссылку.

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *