Братья-славяне, подключайтесь: соревнование по кореференции на основе польского языка.
to be held at NAACL 2016 (San Diego, California, USA), June 16 or 17, 2016
Submission deadline: February 25, 2016
Call For Papers
Many NLP researchers, especially those not working in the area of discourse processing, tend to equate coreference resolution with the sort of coreference that people did in MUC, ACE, and OntoNotes, having the impression that coreference is a well-worn task owing in part to the large number of coreference papers reporting results on the MUC/ACE/OntoNotes coreference corpora. This is an unfortunate misconception: the previous shared tasks on coreference resolution have largely focused on entity coreference, which constitutes only one of the many kinds of coreference relations that were discussed in theoretical and computational linguistics in the past few decades. In fact, by focusing on entity coreference resolution, NLP researchers have only scratched the surface of the wealth of interesting problems in coreference resolution.
The decision to focus on entity coreference resolution was initially made by information extraction (IE) researchers when coreference was selected as one of the tasks in the MUC-6 coreference in 1995. Many interesting kinds of coreference relations, such as bridging and reference to abstract entities, were left out not because they were not important, but because ?it was felt that the menu was simply too ambitious?. It turns out that this decision has an important consequence: the progress made in coreference research in the past two decades was largely driven by the availability of coreference-annotated corpora such as MUC, ACE, and OntoNotes, where entity coreference was the focus.
Given the plethora of work on entity coreference, we believe that time is ripe for a workshop on coreference resolution that would bring together researchers who are interested in under-investigated coreference phenomena.
These include, but are not limited to, the resolution of (1) bridging references, (2) references to abstract entities, (3) pronouns whose resolution require world knowledge (e.g., those in the Winograd Schema Challenge), (4) zero anaphora and ellipsis, and (5) event anaphora, as well as the identification of near-identity and partial coreference relations. Since progress in these under-explored coreference tasks is currently limited in part by the scarcity of annotated corpora, we encourage work that describes the creation and annotation of corpora, especially those with less-investigated coreference phenomena and those involving less-researched languages.
The workshop welcomes submissions describing both theoretical and applied computational work on coreference resolution, especially for languages other than English, less-researched forms of coreference and new applications of coreference resolution. The submissions are expected to discuss theories, evaluation, limitations, system development and techniques relevant to the workshop topics.
Topics of interest include but are not limited to the following:
* Coreference resolution for less-researched languages (e.g., annotation strategies, resolution modules and formal evaluation)
* Evaluation of influence of language-specific properties such as lack of articles, quasi-anaphora, ellipsis or complexity of reflexive pronouns to coreference resolution
* Representation of coreferential relations other than identity coreference (e.g., bridging references, reference to abstract entities, etc.)
* Investigation of difficult cases of anaphora and coreference and their resolution by resorting to e.g. discourse-based and pragmatic levels
* Coreference resolution in noisy data (e.g. in speech and social networks)
* New applications of coreference resolution
* As mentioned above, papers that provide new resources (software or data) are particularly welcome.
The shared task of the workshop seeks to investigate how well one can build a coreference resolver for a ?surprise? language for which only a small amount of coreference-annotated data is available for training. We believe that with this exciting setting, the shared task can help promote the development of coreference technologies that are applicable to a larger number of natural languages than is currently possible.
Polish will be used as the surprise language: not only it has been little investigated in the global NLP community, but recently the coreference-annotated data needed for training and testing have been made available in the Polish Coreference Corpus. The target participants, therefore, will be researchers who have no knowledge of Polish.
The shared task participants will be provided with a small amount of coreference-annotated data (100K?200K tokens) for the surprise language to train their coreference resolvers. In addition, they can use whatever resources they can obtain such as the coreference-annotated English OntoNotes data (freely available on request from LDC), bilingual dictionaries involving English and the surprise language, as well as a large parallel corpus involving English and the surprise language. Like previous shared tasks on coreference resolution, we will provide token-based information for the surprise language (e.g., morphological annotation), which the participants can easily aggregate to obtain phrase- or clause-based information. The goal, then, will be to make use of the available resources to build a coreference resolver for the surprise language.
We solicit previously unpublished work, presented either as long or short papers, following the NAACL 2016 formatting guidelines (http://naacl.org/naacl-pubs/)
December 4, 2015: Shared Task training data released
January 29, 2016: Shared Task Test Data Released
February 5, 2016: Shared Task System Outputs Collected
February 12, 2016: Shared Task Results Announced
February 25, 2016: Workshop Paper Due Date
March 20, 2016: Notification of Acceptance
March 30, 2016: Camera-Ready Papers Due Date
June 16 or 17, 2016: Workshop Date
Anders Björkelund, University of Stuttgart
Antonio Branco, University of Lisbon
Dan Cristea, A. I. Cuza University of Ia?i
Sobha Lalitha Devi, AU-KBC Research Center, Anna University of Chennai
Lars Hellan, Norwegian University of Science and Technology
Veronique Hoste, Ghent University
Yufang Hou, Heidelberg University
Sandra Kübler, Indiana University
Sebastian Martschat, Heidelberg University
Costanza Navaretta, University of Copenhagen
Anna Nedoluzhko, Charles University in Prague
Vincent Ng, University of Texas at Dallas
Michal Novak, Charles University in Prague
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences
Constantin Orasan, University of Wolverhampton
Simone Paolo Ponzetto, University of Mannheim
Massimo Poesio, University of Essex
Marta Recasens, Google Inc.
Agata Savary, François Rabelais University Tours
Heike Zinsmeister, Universität Hamburg
Maciej Ogrodniczuk, Linguistic Engineering Group, Institute of Computer Science, Polish Academy of Sciences
Vincent Ng, Computer Science Department, The University of Texas at Dallas
Maciej Ogrodniczuk and Vincent Ng