=================== EUROLAN-2017 ====================


The 13th in the series of EUROLAN Schools

10 – 17 September 2017, Constanța, Romania [1]


Biomedical Text Mining (BioNLP) applies natural language processing (NLP)
techniques to identify and extract information from scientific publications
in biology, medicine, and chemistry, in order to discover novel knowledge
that can contribute to biomedical research.The growth of BioNLP over the past
fifteen years is due in large part to the availability of web-based
publication databases such as PubMed and Web of Science coupled with
increasing access to anonymized electronic medical/health records. The large
size of the biomedical literature and its rapid growth in recent years make
literature search and information access a demanding task. Health-care
professionals in the clinical domain face a similar problem of information
explosion when dealing with the ever-increasing body of available
medical/health records in electronic form. Beyond merely identifying texts
relevant to a particular interest, BioNLP applies sophisticated NLP
information extraction (IE) technologies (e.g., event extraction or
entity-relation extraction) to identify and analyze text segments to produce
information about, or even models, of phenomena such as drug or protein
interactions, gene relations, temporal relations in clinical records,
biological processes, etc. Overall, the application of automatic NLP
techniques to unstructured text in scientific literature and medical records
enables life scientists to both find and exploit this data without the
significant effort of manual searching and researching.

EUROLAN-2017 has engaged several well-known researchers in the fields of
BioNLP and NLP to provide a comprehensive overview of language processing
models and techniques applicable to the biomedical domain, ranging from an
introduction to fundamental NLP technologies to the study of use cases and
exploitation of available tools and frameworks that support BioNLP. Each
tutorial is accompanied by one or two hands-on sessions, in which
participants will use text mining tools to explore and exploit several
varieties of biomedical language resources, including cloud-based
repositories of scientific publications, annotated biomedical corpora,
databases and ontologies of biomedical terms, etc. The topics covered in the
tutorials and hands-on sessions include:

• mining biomedical literature
• entity identification and normalization
• conceptual graphs extracted from medical texts
• annotation of semantic content, with applications in medicine and biology
• medical search engines
• deep learning for bioinformatics
• biomedical question/answering
• clinical data repositories
• big data and cloud computing in relation with biomedical textual data
• clinical relationships
• medical topic modeling
• medical language systems
• clinical text analysis
• text summarization in the biomedical domain
• event-based text mining for biology and related fields
• event extraction in medical texts

Invited Lecturers

• Mihaela Breabăn – “Alexandru Ioan Cuza” University of Iași
• Kevin Cohen – University of Colorado at Boulder (USA)
• Noa Patricia Cruz Diaz – Virgen del Rocio University Hospital
• Eric Gaussier – University Grenoble Alps (France)
• Nancy Ide – Vassar College (USA)
• Pierre Zweigenbaum – LIMSI, CNRS, Université Paris-Saclay, Orsay
Читать далее

Рубрика: Курсы/Образование/Постдоки | Добавить комментарий

Natural Language Processing meets Journalism — workshop at EMNLP 2017


EMNLP 2017 Workshop

September 7, Copenhagen, Denmark [1]

Call for Papers

With the advent of the digital era, journalism faces what seems to be a major
change in its history — data processing. While much journalistic effort has
been (and still is) dedicated to information gathering, now a great deal of
information is readily available ñ but is dispersed in a large quantity of
data. Thus processing a continuous and very large flow of data has become a
central challenge in today's journalism.

With the recognition of this challenge, it has become widely accepted that
data-driven journalism is the future. Tools which perform big data mining in
order to pick out and link together what is interesting from various multi
media resources are needed; these tools will be used as commonly as
typewriters once were. Their scope is well beyond data classification: They
need to construct sense and structure out of the never- ending flow of
reported facts, ascertaining what is important and relevant. They need to be
able to detect what is behind the text, what authors' intentions are, what
opinions are expressed and how, whose propagandistic goal an article might
serve, etc. What's more, they need to go beyond an intelligent search engine:
They need to be picky and savvy, just like good journalists, in order to help
people see what is really going on. It must be added that we have already
been subjected to a large scale invasion of seemingly new techniques: fake
news, alternative facts etc. For better or for worse, this is indeed the
reality we must make decisions in, and we must developed tools for handling
it rightly. That is, natural language processing meeting journalism is a
crucial process that has to be instantiated on each tablet , phone or monitor
on which a piece of news is displayed — for reading or writing.

At this workshop we anticipate papers that report on state-of-the-art
inquiries into the analysis and use of large to huge news corpora. A news
corpus is generally understood as scoping over newspapers, social networks,
the web, etc. The papers should present computational techniques able to
manage a huge quantity of information and/or to perform deep analyses that
extend over actual state of the art. We welcome reports on the recent
progress on overcoming the bottlenecks in open domain relation extraction,
paraphrasing, textual entailments and semantic similarity, and on their
results in analyzing news content. However, we are also greatly interested in
technologies for enhancing the communicative function of language in this
context more generally, including in computational humor, NLP creativity for
advertising, plagiarism, fake news etc.
Читать далее

Рубрика: Конференции | Добавить комментарий

AIST 2017 : The 6th International Conference on Analysis of Images, Social Networks, and Texts


The conference is intended for computer scientists and practitioners whose research interests are related to data science. The previous conferences in 2012-2016 attracted a significant number of students, researchers, academics and engineers working on analysis of images, texts, and social networks. The broad scope of AIST makes it an event where researchers from different domains, such as image and text processing, exploiting various data analysis techniques, can meet and exchange ideas. The conference allows specialists from different fields to meet each other, present their work, and discuss both theoretical and practical aspects of their data analysis problems. Another important aim of the conference is to stimulate scientists and people from the industry to benefit from the knowledge exchange and identify possible grounds for fruitful collaboration.


The scope of the conference includes the following topics:

— Social Network Analysis
— Natural Language Processing
— Recommender systems and collaborative technologies
— Analytics for geoinformation systems
— Analysis of images and video
— Discovering and analyzing processes using event data
— Game analytics
— Core Data Mining and Machine Learning techniques
— Semantic Web and Ontologies
— Educational Data Mining
— ML & DM for Economics and Social Sciences


Similarly to the previous years, the conference proceedings will be published in the Springer’s Communications in Computer and Information Science (CCIS) or in Lecture Notes in Artificial Intelligence (LNAI) series. Proceeding of the AIST 2015 conference can be found at and proceedings of the AIST 2016 are available at

Publication of revised selected papers is performed as a post-proceedings in the Springer’s Lecture Notes in Computer Science (LNCS) series.


The 6th conference on Analysis of Images, Social Networks, and Texts will take place in Moscow, Russia from Thursday, 27th through Saturday, 29th of July 2017.


Submission of abstracts: April 30, 2017
Deadline for papers: May 7, 2017
Notification of Acceptance: June 7, 2017
The Conference: July 27 — 29, 2017


Track 1. General topics of data analysis
Sergey Kuznetsov (Higher School of Economics, Moscow)
Amedeo Napoli (LORIA, Nancy)

Track 2. Natural language processing
Natalia Loukachevitch (Moscow State Lomonosov University)
Alexander Panchenko (University of Hamburg, Hamburg)

Track 3. Social network analysis
Stanley Wasserman (Indiana University, Bloomington)

Track 4. Analysis of images and video
Victor Lempitsky (Skolkovo Institute of Science and Technology, Moscow)
Andrey Savchenko (Higher School of Economics, Nizhny Novgorod)

Track 5. Optimization problems on graphs and network structures
Panos M. Pardalos (University of Florida)
Mikhail Khachay (IMM UB RAS & Ural Federal University)

Track 6. Analysis of dynamic behavior through event data
Wil van der Aalst (Eindhoven University of Technology)
Irina Lomazova (Higher School of Economics, Moscow)

Рубрика: Без рубрики | Добавить комментарий

Call for Papers: Special Issue of the journal Computational Linguistics on Language in Social Media

Special Issue of the journal Computational Linguistics on:
Language in Social Media: Exploiting discourse and other contextual

*** Deadline 15th October 2017 (11:59 pm PST) ***

For more details see: [1]

**Guest editors**
Farah Benamara — IRIT, Toulouse University ( [2])
Diana Inkpen — University of Ottawa  ( [3])
Maite Taboada — Simon Fraser University ( [4])

**Contact** [5]

**Call for papers**
Social media content (SMC) is changing the way people interact with each
other and share information, personal messages, and opinions about
situations, objects and past experiences. This content (ranging from blogs,
fora, reviews, and various social networking sites) has specific
characteristics that are often referred as the five V's: volume, variety,
velocity, veracity, and value. Most of them are short online conversational
posts or comments often accompanied by non-linguistic contextual information,
including metadata such as the social network of each user and their
interactions with other users. Exploiting the context of a word or a sentence
increases the amount of information we can get from it and enables novel
applications. Such rich contextual information, however, makes natural
language processing (NLP) of SMC a challenging research task. Indeed, simply
applying traditional text mining tools is clearly sub-optimal, as such
methods take into account neither the interactive dimension nor the
particular nature of this data, which shares properties of both spoken and
written language.

Most research on NLP for social media focuses primarily on content-based
processing of the linguistic information, using lexical semantics (e.g.,
discovering new word senses or multiword expressions) or semantic analysis
(opinion extraction, irony detection, event and topic detection, geo-location
detection) (Londhe et al., 2016; Aiello et al., 2013; Inkpen et al., 2015;
Ghosh et al., 2015). Other research explores the interactions between content
and extra-linguistic or extra-textual features like time, place, author
profiles, demographic information, conversation thread and network structure,
showing that combining linguistic data with network and/or user context
improves performance over a baseline that uses only textual information (West
et al., 2014; Karoui et al., 2015; Volkova et al., 2014; Ren et al., 2016).

We expect that papers in this special issue will contribute to a deeper
understanding of these interactions from a new perspective of discourse
interpretation. We believe that we are entering a new age of mining social
media data, one that extracts information not just from individual words,
phrases and tags, but also uses information from discourse and the wider
context. Most of the “big data” revolution in social media analysis has
examined words in isolation, a “bag-of-words” approach. We believe it is
possible to investigate big data, and social media data in general, by
exploiting contextual information.

We encourage submission of papers that address deep issues in linguistics,
computational linguistics and social science. In particular, our focus is on
the exploitation of contextual information within the text (discourse,
argumentation chains) and extra-linguistic information (social network,
demographic information, geo-location) to improve NLP applications and help
building pragmatic-based NLP systems. The special issue aims also to bring
researchers that propose new solutions for processing SMC  in various
use-cases including sentiment analysis, detection of offensive content, and
intention detection. These solutions need to be reliable enough in order to
prove their effectiveness against shallow bag-of-words approaches or
content-based approaches alone.
Читать далее

Рубрика: Конференции | Добавить комментарий

Germeval Task 2017 Shared Task on Aspect-based Sentiment in Social Media Customer Feedback


In the connected, modern world, customer feedback is a valuable source for insights on the quality of products or services. This feedback allows other customers to benefit from the experiences of others and enables businesses to react on requests, complaints or recommendations. However, the more people use a product or service, the more feedback is generated, which results in the major challenge of analyzing huge amounts of feedback in an efficient, but still meaningful way.

Thus, we propose a shared task on automatically analyzing customer reviews about “Deutsche Bahn” — the german public train operator with about two billion passengers each year.


 “RT @XXX: Da hört jemand in der Bahn so laut ‘700 Main Street’ durch seine Kopfhörer, dass ich mithören kann. :( :( :(“

As shown in the example, insights from reviews can be derived on different granularities. The review contains a general evaluation of the travel (The customer disliked the travel). Furthermore, the review evaluates a dedicated aspect of the train travel (“laut” → customer did not like the noise level).

Consequently, we frame the task as aspect-based sentiment analysis with four sub tasks:


Читать далее

Рубрика: Конференции, Ресурсы/Софт | Добавить комментарий

The 3rd CL-SciSumm Summarization Shared Task, SIGIR 2017

Call for Participation

The 3rd CL-SciSumm 2017 Shared Task

at SIGIR 2017 on Friday, August 11,  2017

To be held as a part of the

2nd Joint Workshop of Bibliometric-enhanced IR and NLP for Digital Libraries (BIRNDL)

Sponsored by Microsoft Research Asia



We invite you to participate in our Shared Task on the relationship mining and scientific summarization of computational linguistics research papers. Scientific summarization can play an important role in developing methods to index, represent, retrieve, browse and visualize information in large scholarly databases.

The proceedings of our previous workshops (BIRNDL and BIR) are being published as a special issue on “Bibliometrics, Information Retrieval and Natural Language Processing in Digital Libraries” in the International Journal on Digital Libraries, and as a special issue on “Bibliometric-enhanced IR” in Scientometrics. At SIGIR 2017, we will once again invite the authors of selected system papers at the CL-SciSumm Shared Task, to submit extended versions to a special issue in a highly visible and prestigious journal.



The 3rd CL-SciSumm Shared Task provides resources to encourage research in entity mining, relationship extraction, question answering and other NLP tasks for scientific papers. It comprises annotated citation information connecting research papers with citing papers. Citations are embedded with meta-commentary, which offer a contextual, interpretative layer to the cited text and emphasize certain kinds of information over others.

The Task


The task comprises a set of topics, each consisting of a research paper (RP) in CL, and ten or more papers which cite it (citing papers, CP). The text spans (citances) which relate the citing paper to the reference paper have already been identified.

Task 1a: For each citance, identify the cited text span in the RP that most accurately reflects the citance.

Task 1b: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets.

Evaluation: Task 1 will be scored by overlap of text spans in the system output vs the gold standard created by human annotators

Task 2: (optional bonus task): Finally, generate a structured summary of the RP from the cited text spans of the RP. The length of the summary should not exceed 250 words.

Evaluation: Task 2 will be scored using the ROUGE evaluation metric to compare automatic summaries against paper abstracts, human written summaries and community summaries constructed using the output of Task 1a.

How To Participate


1. Register for the CL-SciSumm Shared Task at <> by May 31

2. Browse our git repository at <> and download the training set.

3. Develop and train your system to solve Task 1a, 1b and/or Task 2 on the training set.

4. Meanwhile, submit a tentative system description, by May 31.

5. Evaluate your system on the test set, to be released on July 1, and upload your results to our Codalabs portal (to be announced later), to self-evaluate your performance.

6. Tell us about your approach in a paper; submit it by July 30, 2017.

7. Attend the BIRNDL workshop at SIGIR on August 11, and present your work.

Important Dates


Registration opens: April 20, 2017

Training set posted: May 1, 2017

Short system description due: May 31, 2017

Test Set posted and evaluation period begins: July 1, 2017

Evaluation period ends: July 15, 2017

System reports (papers) due: July 30, 2017

Presentation at 2nd BIRNDL 2017 workshop, SIGIR: Aug 11, 2017

Camera ready contributions due for CEUR proceedings: TBD



Kokil Jaidka, University of Pennsylvania (jaidka at

Muthu Kumar Chandrasekaran, National University of Singapore (muthu.chandra at

Min-Yen Kan, National University of Singapore (kanmy at )

Читать далее

Рубрика: Конференции, Ресурсы/Софт | Добавить комментарий

CFP: Building and Using Comparable Corpora at ACL’17, Vancouver, Canada

10th Workshop on Building and Using Comparable Corpora
Shared task: detection of parallel sentences in Comparable Corpora
Important dates
Workshop Submission deadline: 21 April, 2017
Workshop Notification:  19 May, 2017
Workshop Camera Ready:  26 May, 2017
*Shared task:  Identifying parallel sentences in comparable corpora*
We announce a new shared task for 2017. As is well known, a bottleneck
in statistical machine translation is the scarceness of parallel
resources for many language pairs and domains. Previous research has
shown that this bottleneck can be reduced by utilizing parallel
portions found within comparable corpora. These are useful for many
purposes, including automatic terminology extraction and the training
of statistical MT systems.
The aim of the shared task is to quantitatively evaluate competing
methods for extracting parallel sentences from comparable monolingual
corpora, so as to give an overview on the state of the art and to
identify the best performing approaches.
Shared task sample set release: 6 February, 2017
Shared task training set release: 13 February, 2017
Shared task test set release: 21 April, 2017
Shared task test submission deadline: 28 April, 2017
Shared task camera ready papers: 26 May, 2017
Any submission to the shared task is expected to be accompanied
by a short paper (4 pages plus references).  This will be accepted
for publication in the workshop proceedings automatically, although
the submission will go via Softconf with the standard peer-review
In the language engineering and the linguistics communities, research
in comparable corpora has been motivated by two main reasons. In
language engineering, it is chiefly motivated by the need to use
comparable corpora as training data for statistical NLP applications
such as statistical machine translation or cross-lingual retrieval. In
linguistics, on the other hand, comparable corpora are of interest in
themselves by making possible intra-linguistic discoveries and
comparisons. It is generally accepted in both communities that
comparable corpora are documents in one or several languages that are
comparable in content and form in various degrees and dimensions. We
believe that the linguistic definitions and observations related to
comparable corpora can improve methods to mine such corpora for
applications of statistical NLP. As such, it is of great interest to
bring together builders and users of such corpora.

Читать далее

Рубрика: Конференции | Добавить комментарий

Second Call for Papers: the Workshop on Stylistic Variation at EMNLP 2017

The overall goal of this workshop is to bring together a diverse collection
of researchers who encounter stylistic variation directly or indirectly in
their work, identifying joint challenges and future directions.

Two of the overarching questions that motivate this workshop are:
1. to what extent it is possible or desirable to go beyond superficial,
uninterpretable, task-specific stylistic features to deeper, broader, more
systematic, and more psychologically-plausible conceptualizations of
stylistic variation
2. to what extent recent advances in related areas such as distributional
semantics can be applied to better capture stylistic variation.

For purposes of the workshop, “stylistic variation” includes variation in
phonological, lexical, syntactic, or discourse realization of particular
semantic content, due to differences in extralinguistic variables such as
individual speaker, speaker demographics, target audience, genre and so on. A
(non-exhaustive) list of topics of interest follows.


—          Evidence for or against targeted approaches to stylistic variation
—          General methods for differentiating style from semantics/topic
—          Interpretability of computational models of style
—          Use of classic stylistic features (e.g. function words, POS
n-grams) in classification
—          Effects of stylistic variation on downstream tasks
—          Stylometry
—          Authorship attribution
—          Stylistic segmentation/intrinsic plagiarism detection
—          Style in distributional vector space models (embeddings, etc.)
—          Stylistic lexicon acquisition
—          Text normalization
—          Domain adaptation (across stylistically distinct domains)
—          Modelling of demographics and personality
—          Politeness and other linguistic manifestations of social power
—          Quantification of genre differences
—          Stylistically-informed sentiment analysis (e.g. sarcasm, hate speech)
—          Readability, complexity, and simplification
—          Learner language (e.g. fluency, use of collocations, stylistic
appropriateness, etc.)
—          Style-aware natural language generation
—          Identifying trustworthiness and deception
—          Literary stylistics (author and character profiling)
—          Rhetoric (e.g. stylistic choice in political speeches, etc.)
—          Stylistic features for diagnosis of mental illness
—          Style in acoustic signals (e.g. speaker identification)
—          The challenges of annotating style
Читать далее

Рубрика: Конференции | Добавить комментарий

Summer School on Recommender Systems

ACM Summer School on Recommender Systems, August 21st-25th Bolzano, Italy


The ACM Summer School on Recommender Systems is co-funded by SIGCHI via its conference development fund for the ACM conference series on Recommender Systems and has been granted additional support by the Free University of Bozen-Bolzano.


It will be held as a pre-program to this year’s RecSys conference from Monday 21st of August to Friday 25th in Bolzano, Italy. The Doctoral Symposium is integrated into the program of the summer school and takes place in parallel on the last day of the summer school. Participation at the summer school requires registration via this website. Registration will remain open until the capacity limit is reached.


Leaders in the field as well as promising younger researchers volunteered to serve as speakers at this summer school. The lectures are covering a broad range of topics from an algorithmic as well as an methodological perspective and will also include hands-on sessions. In addition upcoming and trending topics such as recommending to groups or affect and personality-based recommendation approaches will be addressed.



Robin Burke, DePaul University, USA

Francesco Ricci, Free University Bozen-Bolzano, Italy

Markus Zanker, Free University of Bozen-Bolzano, Italy



Рубрика: Курсы/Образование/Постдоки | Добавить комментарий

CLEF eHealth Evaluation Lab

CLEF eHealth Evaluation Lab 2017
Held as part of CLEF 2017, September 11-14 2017, Dublin — Ireland

Offering shared tasks and data on:

*Multilingual Information Extraction*
*Technologically Assisted Reviews in Empirical Medicine*
*Patient-centred Information Retrieval*

Register at:

Registration Closes: 21 April 2017

Details at:


In today’s information overloaded society it is increasingly difficult to retrieve and digest valid and relevant information to make health-centered decisions. Medical content is becoming available electronically in a variety of forms ranging from patient records and medical dossiers, scientific publications and health-related websites to medical-related topics shared across social networks. Laypeople, clinicians and policy-makers need to easily retrieve, and make sense of medical content to support their decision making. Information retrieval systems have been commonly used as a means to access health information available online. However, the reliability, quality, and suitability of the information for the target audience varies greatly while high recall or coverage, that is finding all relevant information about a topic, is often as important as high precision, if not more. Furthermore, the information seekers in the health domain also experience difficulties in expressing their information needs as search queries.

CLEF eHealth aims to bring together researchers working on related information access topics and provide them with datasets to work with and validate the outcomes. This, the sixth year of the lab, offers the following three tasks.

Task 1. Multilingual Information Extraction
Task 2. Technologically Assisted Reviews in Empirical Medicine
Task 3. Patient-centred Information Retrieval

Читать далее

Рубрика: Конференции, Ресурсы/Софт | Добавить комментарий