Call for Participation in the Shared Task on Projection-Based Coreference Resolution
OVERVIEW OF THE TASK
Previous shared tasks on coreference resolution (e.g., the SemEval 2010 shared task Coreference Resolution in Multiple Languages , the CoNLL 2011 and CoNLL 2012 shared tasks) operated in a setting where a large amount of training data was provided to train coreference resolvers in a fully supervised manner. Our shared task has a different goal: we are primarily interested in a low-resource setting. In particular, we seek to investigate how well one can build a coreference resolver for a language for which there is no coreference-annotated data available for training.
With a rising interest in annotation projection, we hereby offer a projection-based task which will facilitate the application of existing coreference resolution algorithms to new languages. We believe that with this exciting setting, the shared task can help promote the development of coreference technologies that are applicable to a larger number of natural languages than is currently possible.
This year we will focus on two languages ? German and Russian. To mimic a low-resource setting, no German or Russian coreference-annotated data will be provided. Rather, to facilitate system development, the shared task participants will be provided two versions of an English-German-Russian parallel corpus: an unlabelled version and a labelled version. The labelled version has the English side of the parallel corpus automatically coreference-labelled using the Berkeley coreference resolver, which was trained on the English OntoNotes corpus.
SHARED TASK TRACKS
Participants will compete in two tracks:
* Closed track: projection-based coreference resolution on German and Russian. The only coreference-annotated training data that the participants can use is the English OntoNotes corpus. Alternatively, they can use any of the publicly-available coreference resolvers trained on English OntoNotes. They can then use whatever parallel corpus and method they prefer to project the English annotations into German/Russian and subsequently train a new coreference resolver on the projected annotations. As for additional linguistic information, the participants can use POS information provided by the parser of their choice. Note that they do not have to use the provided English-German-Russian parallel corpus.
* Open track: coreference resolution on German and Russian with no restriction on the kind of coreference-annotated data the participants can use for training. For instance, they can label their own German/Russian coreference data and use it to train a German/Russian coreference resolver, or they can adopt a heuristic-based approach where they employ knowledge of German/Russian to write coreference rules for these languages.
The participants can choose to take part in either one or both tracks. —————————
Please see the Data package section at http://corbon.nlp.ipipan.waw.
The evaluation will follow CoNLL-2012 shared task’s strategy. The ranking score will be determined by computing the unweighted average across the MUC, B-CUBED and CEAFE.
The systems will be run on test data by participants who are required to send their outputs to the Shared Task Coordinator by December 27th (CET).
November 29, 2015: Training data released
December 19, 2016: Test data released
December 27, 2016: System outputs collected
January 6, 2017: Shared task results announced
January 16, 2017: System description paper due
February 11, 2017: Notification of acceptance
February 21, 2017: Camera-ready papers due
April 3 or 4, 2017: Workshop date
Yulia Grishina (University of Potsdam)