The 3rd CL-SciSumm Summarization Shared Task, SIGIR 2017

Call for Participation

The 3rd CL-SciSumm 2017 Shared Task

at SIGIR 2017 on Friday, August 11,  2017

To be held as a part of the

2nd Joint Workshop of Bibliometric-enhanced IR and NLP for Digital Libraries (BIRNDL)

Sponsored by Microsoft Research Asia



We invite you to participate in our Shared Task on the relationship mining and scientific summarization of computational linguistics research papers. Scientific summarization can play an important role in developing methods to index, represent, retrieve, browse and visualize information in large scholarly databases.

The proceedings of our previous workshops (BIRNDL and BIR) are being published as a special issue on “Bibliometrics, Information Retrieval and Natural Language Processing in Digital Libraries” in the International Journal on Digital Libraries, and as a special issue on “Bibliometric-enhanced IR” in Scientometrics. At SIGIR 2017, we will once again invite the authors of selected system papers at the CL-SciSumm Shared Task, to submit extended versions to a special issue in a highly visible and prestigious journal.



The 3rd CL-SciSumm Shared Task provides resources to encourage research in entity mining, relationship extraction, question answering and other NLP tasks for scientific papers. It comprises annotated citation information connecting research papers with citing papers. Citations are embedded with meta-commentary, which offer a contextual, interpretative layer to the cited text and emphasize certain kinds of information over others.

The Task


The task comprises a set of topics, each consisting of a research paper (RP) in CL, and ten or more papers which cite it (citing papers, CP). The text spans (citances) which relate the citing paper to the reference paper have already been identified.

Task 1a: For each citance, identify the cited text span in the RP that most accurately reflects the citance.

Task 1b: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets.

Evaluation: Task 1 will be scored by overlap of text spans in the system output vs the gold standard created by human annotators

Task 2: (optional bonus task): Finally, generate a structured summary of the RP from the cited text spans of the RP. The length of the summary should not exceed 250 words.

Evaluation: Task 2 will be scored using the ROUGE evaluation metric to compare automatic summaries against paper abstracts, human written summaries and community summaries constructed using the output of Task 1a.

How To Participate


1. Register for the CL-SciSumm Shared Task at <> by May 31

2. Browse our git repository at <> and download the training set.

3. Develop and train your system to solve Task 1a, 1b and/or Task 2 on the training set.

4. Meanwhile, submit a tentative system description, by May 31.

5. Evaluate your system on the test set, to be released on July 1, and upload your results to our Codalabs portal (to be announced later), to self-evaluate your performance.

6. Tell us about your approach in a paper; submit it by July 30, 2017.

7. Attend the BIRNDL workshop at SIGIR on August 11, and present your work.

Important Dates


Registration opens: April 20, 2017

Training set posted: May 1, 2017

Short system description due: May 31, 2017

Test Set posted and evaluation period begins: July 1, 2017

Evaluation period ends: July 15, 2017

System reports (papers) due: July 30, 2017

Presentation at 2nd BIRNDL 2017 workshop, SIGIR: Aug 11, 2017

Camera ready contributions due for CEUR proceedings: TBD



Kokil Jaidka, University of Pennsylvania (jaidka at

Muthu Kumar Chandrasekaran, National University of Singapore (muthu.chandra at

Min-Yen Kan, National University of Singapore (kanmy at )

Читать далее

Рубрика: Конференции, Ресурсы/Софт | Добавить комментарий

CFP: Building and Using Comparable Corpora at ACL’17, Vancouver, Canada

10th Workshop on Building and Using Comparable Corpora
Shared task: detection of parallel sentences in Comparable Corpora
Important dates
Workshop Submission deadline: 21 April, 2017
Workshop Notification:  19 May, 2017
Workshop Camera Ready:  26 May, 2017
*Shared task:  Identifying parallel sentences in comparable corpora*
We announce a new shared task for 2017. As is well known, a bottleneck
in statistical machine translation is the scarceness of parallel
resources for many language pairs and domains. Previous research has
shown that this bottleneck can be reduced by utilizing parallel
portions found within comparable corpora. These are useful for many
purposes, including automatic terminology extraction and the training
of statistical MT systems.
The aim of the shared task is to quantitatively evaluate competing
methods for extracting parallel sentences from comparable monolingual
corpora, so as to give an overview on the state of the art and to
identify the best performing approaches.
Shared task sample set release: 6 February, 2017
Shared task training set release: 13 February, 2017
Shared task test set release: 21 April, 2017
Shared task test submission deadline: 28 April, 2017
Shared task camera ready papers: 26 May, 2017
Any submission to the shared task is expected to be accompanied
by a short paper (4 pages plus references).  This will be accepted
for publication in the workshop proceedings automatically, although
the submission will go via Softconf with the standard peer-review
In the language engineering and the linguistics communities, research
in comparable corpora has been motivated by two main reasons. In
language engineering, it is chiefly motivated by the need to use
comparable corpora as training data for statistical NLP applications
such as statistical machine translation or cross-lingual retrieval. In
linguistics, on the other hand, comparable corpora are of interest in
themselves by making possible intra-linguistic discoveries and
comparisons. It is generally accepted in both communities that
comparable corpora are documents in one or several languages that are
comparable in content and form in various degrees and dimensions. We
believe that the linguistic definitions and observations related to
comparable corpora can improve methods to mine such corpora for
applications of statistical NLP. As such, it is of great interest to
bring together builders and users of such corpora.

Читать далее

Рубрика: Конференции | Добавить комментарий

Second Call for Papers: the Workshop on Stylistic Variation at EMNLP 2017

The overall goal of this workshop is to bring together a diverse collection
of researchers who encounter stylistic variation directly or indirectly in
their work, identifying joint challenges and future directions.

Two of the overarching questions that motivate this workshop are:
1. to what extent it is possible or desirable to go beyond superficial,
uninterpretable, task-specific stylistic features to deeper, broader, more
systematic, and more psychologically-plausible conceptualizations of
stylistic variation
2. to what extent recent advances in related areas such as distributional
semantics can be applied to better capture stylistic variation.

For purposes of the workshop, “stylistic variation” includes variation in
phonological, lexical, syntactic, or discourse realization of particular
semantic content, due to differences in extralinguistic variables such as
individual speaker, speaker demographics, target audience, genre and so on. A
(non-exhaustive) list of topics of interest follows.


—          Evidence for or against targeted approaches to stylistic variation
—          General methods for differentiating style from semantics/topic
—          Interpretability of computational models of style
—          Use of classic stylistic features (e.g. function words, POS
n-grams) in classification
—          Effects of stylistic variation on downstream tasks
—          Stylometry
—          Authorship attribution
—          Stylistic segmentation/intrinsic plagiarism detection
—          Style in distributional vector space models (embeddings, etc.)
—          Stylistic lexicon acquisition
—          Text normalization
—          Domain adaptation (across stylistically distinct domains)
—          Modelling of demographics and personality
—          Politeness and other linguistic manifestations of social power
—          Quantification of genre differences
—          Stylistically-informed sentiment analysis (e.g. sarcasm, hate speech)
—          Readability, complexity, and simplification
—          Learner language (e.g. fluency, use of collocations, stylistic
appropriateness, etc.)
—          Style-aware natural language generation
—          Identifying trustworthiness and deception
—          Literary stylistics (author and character profiling)
—          Rhetoric (e.g. stylistic choice in political speeches, etc.)
—          Stylistic features for diagnosis of mental illness
—          Style in acoustic signals (e.g. speaker identification)
—          The challenges of annotating style
Читать далее

Рубрика: Конференции | Добавить комментарий

Summer School on Recommender Systems

ACM Summer School on Recommender Systems, August 21st-25th Bolzano, Italy


The ACM Summer School on Recommender Systems is co-funded by SIGCHI via its conference development fund for the ACM conference series on Recommender Systems and has been granted additional support by the Free University of Bozen-Bolzano.


It will be held as a pre-program to this year’s RecSys conference from Monday 21st of August to Friday 25th in Bolzano, Italy. The Doctoral Symposium is integrated into the program of the summer school and takes place in parallel on the last day of the summer school. Participation at the summer school requires registration via this website. Registration will remain open until the capacity limit is reached.


Leaders in the field as well as promising younger researchers volunteered to serve as speakers at this summer school. The lectures are covering a broad range of topics from an algorithmic as well as an methodological perspective and will also include hands-on sessions. In addition upcoming and trending topics such as recommending to groups or affect and personality-based recommendation approaches will be addressed.



Robin Burke, DePaul University, USA

Francesco Ricci, Free University Bozen-Bolzano, Italy

Markus Zanker, Free University of Bozen-Bolzano, Italy



Рубрика: Курсы/Образование/Постдоки | Добавить комментарий

CLEF eHealth Evaluation Lab

CLEF eHealth Evaluation Lab 2017
Held as part of CLEF 2017, September 11-14 2017, Dublin — Ireland

Offering shared tasks and data on:

*Multilingual Information Extraction*
*Technologically Assisted Reviews in Empirical Medicine*
*Patient-centred Information Retrieval*

Register at:

Registration Closes: 21 April 2017

Details at:


In today’s information overloaded society it is increasingly difficult to retrieve and digest valid and relevant information to make health-centered decisions. Medical content is becoming available electronically in a variety of forms ranging from patient records and medical dossiers, scientific publications and health-related websites to medical-related topics shared across social networks. Laypeople, clinicians and policy-makers need to easily retrieve, and make sense of medical content to support their decision making. Information retrieval systems have been commonly used as a means to access health information available online. However, the reliability, quality, and suitability of the information for the target audience varies greatly while high recall or coverage, that is finding all relevant information about a topic, is often as important as high precision, if not more. Furthermore, the information seekers in the health domain also experience difficulties in expressing their information needs as search queries.

CLEF eHealth aims to bring together researchers working on related information access topics and provide them with datasets to work with and validate the outcomes. This, the sixth year of the lab, offers the following three tasks.

Task 1. Multilingual Information Extraction
Task 2. Technologically Assisted Reviews in Empirical Medicine
Task 3. Patient-centred Information Retrieval

Читать далее

Рубрика: Конференции, Ресурсы/Софт | Добавить комментарий

EACL 2017

Всю предыдущую неделю я провела в Валенсии, на конференции EACL. Очень насыщенная программа, приятная атмосфера и неожиданно много знакомых лиц. Много зелени, деревья, усыпанные диковинными цветами всех оттенков розового, пальмы, выстроившиеся вдоль длинных бульваров, усыпанные плодами апельсиновые деревья, диковинные новостройки на фоне голубого неба с пробегающими легкими облачками. И где-то там еще центр с готическими соборами, артнувошным рынком и томными вечерами, парк, разбитый в высохшем русле реки, еще более диковинные и еще более новые творения архитектора Калатравы, а еще дальше — море и пляжи, которые мне, например, довелось увидеть только из окошка самолета.

Читать далее

Рубрика: Обзоры/Редакционное | Добавить комментарий

аспирантура в Копенгагене

Two fully-funded 3-year PhD positions in information retrieval are open at the University of Copenhagen in Denmark. Deadline for applying for the positions is the 30th of April 2017. For more details, please see:

Рубрика: Вакансии/Стажировки | Добавить комментарий

Фрукт в Хельсинки

21 st Conference of Open Innovations
Association FRUCT
Helsinki, Finland, 6-10 November 2017
FRUCT is the largest regional cooperation framework in form of open innovations between academia and industry. FRUCT conferences are attended by the representatives of more than 25 FRUCT member universities from Russia, Finland, Denmark, India, Italy and other countries, industrial experts from Dell EMC, Nokia, MariaDB, Intel, Jolla,
Open Mobile Platform, Skolkovo and a number of guests from other companies and universities. The conference is an R&D forum for the most active students, academic experts, industrial researchers and influential representatives of business and government. The conference invites the world-class academic and industrial researchers to give lectures on the most relevant topics, provides an opportunity for student teams to
present progress and results of their R&D projects, meet new interesting people and form new R&D teams. The conference program consists of 3 to 5 intensive (1⁄2 or full day) trainings on the most promising technologies, plus 3 days of the main conference.
We warmly welcome all university research teams to participate in the conference, present your research and join the FRUCT Association. IEEE members and representatives of Russian and Finnish universities are entitled to large discounts. Registration to the conference is open at

List of conference topics
— Location Based Services, Navigation, Logistics management, e-Tourism solutions
— Mobile Healthcare, e-Health solutions, Wellbeing, Fitness, Automated diagnostics
— IoT, Smart Spaces, Future services: Proactivity, Context Analysis, Data Mining and Big Data services
— Cross-platform software, innovative mobile services, new approaches to application design, innovative UX
— Smart Systems and embedded networks
— Energy efficient design & peripherals integration
— Mobile security, personal and business privacy
— Modern network architectures, Air interfaces and  protocols, Emerging wireless technologies
— Mobile multimedia, video services and solutions
— Communications Systems Integration and Modeling

Читать далее

Рубрика: Конференции | Добавить комментарий

Shared Task on Native Language Identification Inbox x

* Website: [1]
* Training data was released today!


We are excited to organize a new shared task on Native Language
Identification (NLI) which will take place at the BEA12 Workshop, co-located
with EMNLP in Copenhagen, September 08, 2017.

NLI is the task of identifying the native language (L1) of a writer based
solely on a sample of their writing or speech. The task is typically framed
as a classification problem where the set of L1s is known a priori. Most work
has focused on identifying the native language of writers learning English as
a second language. Two previous shared tasks on NLI have been organized in
which the task was to identify the native language of non-native speakers of
English-based on essays and spoken responses they provided during a
standardized assessment of academic English proficiency. The first shared
task was based on the essays only and was also held with the BEA workshop in
2013. It was very successful with 29 teams competing, making it one of the
largest shared tasks that year. Three years later, the Computational
Paralinguistics Challenge at Interspeech 2016 hosted a sub-challenge on
identifying the native language based solely on the spoken responses.

This year&#039;s shared task combines the inputs from the two previous tasks.
There will be three tracks: NLI on the essay only, NLI on the spoken response
only (based on a transcription of the response, not the audio), and NLI using
both responses from a test taker. This distinction will make for a more
challenging shared task while building on the methods and results from the
previous two shared tasks.   We promise this shared task will be fun for you
and your colleagues, as well as your whole family.


Educational Testing Service (ETS) is releasing 13,200 English essays and
orthographic transcriptions of 13,200 spoken responses from the TOEFL iBT®
assessment for the 2017 NLI Shared Task with the goal of helping researchers
advance state-of-the-art in the field of NLI.  In addition to the
orthographic transcriptions of the spoken responses, i-vectors generated from
the audio files will be released as a baseline comparison for the
speech-based NLI task (although the audio files themselves are not included
in this data set). The data set contains test responses from 13,200 test
takers (one essay and one spoken response transcription per test taker) and
includes 11 native languages (L1s) with 1,200 test takers per L1. The 11
native languages covered by the corpus are: Arabic, Chinese, French, German,
Hindi, Italian, Japanese, Korean, Spanish, Telugu, and Turkish. The essays
typically range in length from approximately 300 to 400 words and the
transcribed spoken responses typically contain approximately 100 words.
Responses from 11,000 test takers in this set will be used as training data
for the NLI Shared Task, 1,100 for development, and the remaining 1,100 will
be released later as test data.
Читать далее

Рубрика: Конференции, Ресурсы/Софт | Добавить комментарий

вакансия лингвист-разработчик

В компании Just AI (подразделение i-Free, Санкт-Петербург) в связи с расширением открывается вакансия на позицию лингвист-разработчик. Нашим основным технологическим продуктом является платформа для создания чат-ботов. Основные направления нашей деятельности: автоматизация служб клиентской поддержки, conversational commerce, обеспечение диалога на естественном языке с роботами и умными домами.

Составление и проработка сценариев, тем, структуры диалогов для чат-ботов крупных компаний.
Написание небольших скриптов для обработки данных.
Анализ логов диалогов «клиент-оператор» и «клиент-бот», выделение структуры и разметка.

Письменная, стилистическая‚ орфографическая, техническая грамотность, внимательность к деталям.
Навыки программирования на любом из языков (предпочтительно — Python, JavaScript).

Очень приветствуется практический опыт работы с системами обработки естественного языка.
Знание современных методов компьютерной лингвистики, основных задач NLP и методов их решения.
Опыт работы с системами контроля версий Git или Mercurial.
Наличие профильного образования по обработке естественного языка.
Опыт работы с NLP: MyStem, NLTK

Просим присылать ваше резюме на адрес: petr.mitsov[]

Рубрика: Без рубрики | Добавить комментарий