CFP: Building and Using Comparable Corpora at ACL’17, Vancouver, Canada

10th Workshop on Building and Using Comparable Corpora
Shared task: detection of parallel sentences in Comparable Corpora
Important dates
Workshop Submission deadline: 21 April, 2017
Workshop Notification:  19 May, 2017
Workshop Camera Ready:  26 May, 2017
*Shared task:  Identifying parallel sentences in comparable corpora*
We announce a new shared task for 2017. As is well known, a bottleneck
in statistical machine translation is the scarceness of parallel
resources for many language pairs and domains. Previous research has
shown that this bottleneck can be reduced by utilizing parallel
portions found within comparable corpora. These are useful for many
purposes, including automatic terminology extraction and the training
of statistical MT systems.
The aim of the shared task is to quantitatively evaluate competing
methods for extracting parallel sentences from comparable monolingual
corpora, so as to give an overview on the state of the art and to
identify the best performing approaches.
Shared task sample set release: 6 February, 2017
Shared task training set release: 13 February, 2017
Shared task test set release: 21 April, 2017
Shared task test submission deadline: 28 April, 2017
Shared task camera ready papers: 26 May, 2017
Any submission to the shared task is expected to be accompanied
by a short paper (4 pages plus references).  This will be accepted
for publication in the workshop proceedings automatically, although
the submission will go via Softconf with the standard peer-review
In the language engineering and the linguistics communities, research
in comparable corpora has been motivated by two main reasons. In
language engineering, it is chiefly motivated by the need to use
comparable corpora as training data for statistical NLP applications
such as statistical machine translation or cross-lingual retrieval. In
linguistics, on the other hand, comparable corpora are of interest in
themselves by making possible intra-linguistic discoveries and
comparisons. It is generally accepted in both communities that
comparable corpora are documents in one or several languages that are
comparable in content and form in various degrees and dimensions. We
believe that the linguistic definitions and observations related to
comparable corpora can improve methods to mine such corpora for
applications of statistical NLP. As such, it is of great interest to
bring together builders and users of such corpora.

We solicit contributions including but not limited to the following
Building Comparable Corpora:
• Human translations
• Automatic and semi-automatic methods
• Methods to mine parallel and non-parallel corpora from the Web
• Tools and criteria to evaluate the comparability of corpora
• Parallel vs non-parallel corpora, monolingual corpora
• Rare and minority languages, across language families
• Multi-media/multi-modal comparable corpora
Applications of comparable corpora:
• Human translations
• Language learning
• Cross-language information retrieval & document categorization
• Bilingual projections
• Machine translation
• Writing assistance
• Machine learning techniques using comparable corpora
Mining from Comparable Corpora:
• Induction of morphological, grammatical, and translation rules
  from comparable corpora
• Extraction of parallel segments or paraphrases from comparable
• Extraction of bilingual and multilingual translations of single
  words and multi-word expressions, proper names, and named
  entities from comparable corpora
• Induction of multilingual word classes from comparable corpora
• Cross-language distributional semantics
Submission Information
  See BUCC 2017 website:
Workshop organisers:
Serge Sharoff (University of Leeds, UK), Chair
Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France), Shared task organiser
Reinhard Rapp (University of Mainz, Germany)

Об авторе Лидия Пивоварова

СПбГУ - старший преподаватель, University of Helsinki - PhD student
Запись опубликована в рубрике Конференции. Добавьте в закладки постоянную ссылку.

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *