PhD position in Natural Language Processing / Machine Translation at Leiden University

DEADLINE: December 10, but contact Dr. Arianna Bisazza BEFORE NOVEMBER 30 if you intend to apply

The Faculty of Science and Leiden Institute for Advanced Computer Science is looking for a PHD candidate in Natural Language Processing / Machine Translation (4-year fully funded position).

Key responsibilities:

This project aims at improving the state of the art in neural modeling of language and translation with a focus on language structure. The successful candidate will be supervised by Dr. Arianna Bisazza and will perform research and development in the following topics:

— understanding neural language models (e.g. what linguistic phenomena are they able ? or unable ? to capture?);
— syntax-aware language and translation models;
— neural machine translation for morphologically rich languages.

Selection criteria:

— MSc in Natural Language Processing, Artificial Intelligence, Computer Science, or a related discipline;
— experience in machine learning, deep learning, and/or natural language processing;
— strong programming skills;
— ability to work independently, good communication skills, and a passion for research;
— knowledge of neural network frameworks (Torch, PyTorch, Tensorflow, Keras, etc.) is not necessary but highly desirable.

The research will take place at the Leiden Institute of Advanced Computer Science (LIACS) in the Faculty of Science of Leiden University. According to the most recent research visitation, this is one of the foremost computer science departments of the Netherlands.

For more information and the application procedure, please visit this page:

Application deadline: December 10, 2017. Please contact Dr. Arianna Bisazza *before November 30* if you intend to apply.

Рубрика: Курсы/Образование/Постдоки | Добавить комментарий

Second International Workshop on Recent Trends in News Information Retrieval (NewsIR’18) in conjunction with ECIR 2018 in Grenoble, France

Important Dates
Submission deadline: 2/2/2018
Notification of acceptance: 2/3/2018
Camera-ready copies: 16/3/2018
Workshop date: 26/3/2018
Мои впечатления о предыдущем воркшопе вот тут:
The news industry is continuing to experience a revolution. News consumption habits are changing with an increase in both the volume of news and the variety of sources, along with a continuing trend away from print. Readers need new mechanisms to help with this vast volume of information so they can not only find a signal in the noise, but also understand what is happening in the world given the multiple points of view describing every event. At the same time, publishers and aggregators need new ways to reach and retain their audience, and to monetize their services. These challenges in journalism relate to Information Retrieval (IR) and natural language processing (NLP) fields such as: verification of a source’s reliability; the integration of news with other sources of information; real-time processing of both news content and social streams; automatic summarization; news recommendation; multilingual NLP; and entity recognition. Although IR and NLP have been applied to news for decades, the changing nature of the space requires fresh approaches and a closer collaboration with our colleagues from the journalism environment. Building on the success of NewsIR’16, the goal of this workshop is to stimulate discussion between the communities and to share interesting approaches to these challenges. We invite contributions on any of the multiple IR tasks that can help solve real user problems in this area.
Topics of Interest
Relevant topics of interest for NewsIR’18 include but are not limited to:
– Traditional and social media integration
– Temporal aspects of news
– Credibility, controversy and fact-checking
– Bias and plurality in news
– Information silos and the effect of algorithms on news consumption
– Diversification
– Event and anomaly detection
– IR/NLP in Data Journalism
– Multiple document summarization
– User-generated content (e.g., using comments to enhance news retrieval)
– News recommendation
– De-duplication and clustering of news articles
– Author identification and disambiguation
– Evaluation of news retrieval systems
– Data visualization
– Conversational journalism and chat bots
– Mobile-first journalism
– Evaluation
– Data Visualization
– Data collections

Читать далее

Рубрика: Конференции | Добавить комментарий

First International Workshop on Narrative Extraction from Texts (Text2Story’18@ECIR’18), Grenoble, France

First International Workshop on Narrative Extraction from Texts (Text2Story’18@ECIR’18)
Grenoble, France
Proceedings to be submitted to CEUR workshop proceedings (potentially indexed on DBLP). Authors of relevant papers will be invited to submit an extended version of their article to a Special issue hosted by IPM Journal (URL:
++ Important Dates ++
— Submission deadline: January 8th, 2018
— Acceptance Notification Date: February 19th, 2018
— Camera-ready copies: March 5th, 2018
— Workshop: March 26th, 2018
++ Overview ++ 
The increasing availability of text information in the form of news articles, comments or posts in social networks poses new challenges for those who aim to understand the storyline of an event. Although understanding natural language text has improved over the last couple of years with several research works emerging on the grounds of information extraction and text mining, the problem of constructing consistent narrative structures is yet to be solved. It is not only the algorithms that need to be improved, but also the state-of-the-art that needs to advance in order to provide methods that automatically identify, interpret and relate the different elements of a narrative which will be likely spread from different sources. In this workshop we aim to foster the discussion of recent advances in the link between Information Retrieval (IR) and formal narrative representations from texts. More specifically, we aim to capture a wide range of multidisciplinary issues related to the text-to-narrative-structure and to its various related tasks. This is a very rich line of research that poses many challenging problems in information retrieval, text mining, information extraction, computational linguistics and automatic production of media content. 

Читать далее

Рубрика: Конференции | Добавить комментарий

NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0)

Adverse drug events (ADEs) are common and occur in approximately 2-5% of hospitalized adult patients. Each ADE is estimated to increase healthcare cost by more than $3,200. Severe ADEs rank among the top 5 or 6 leading causes of death in the United States. Prevention, early detection and mitigation of ADEs could save both lives and dollars. Employing natural language processing (NLP) techniques on electronic health records (EHRs) provides an effective way of real-time pharmacovigilance and drug safety surveillance.

We’ve annotated 1092 EHR notes with medications, as well as relations to their corresponding attributes, indications and adverse events. It provides valuable resources to develop NLP systems to automatically detect those clinically important entities. Therefore we are happy to announce a public NLP challenge, MADE1.0, aiming to promote deep innovations in related research tasks, and bring researchers and professionals together exchanging research ideas and sharing expertise. The ultimate goal is to further advance ADE detection techniques to improve patient safety and health care quality.

Читать далее

Рубрика: Ресурсы/Софт | Добавить комментарий

Semantic Deep Learning

Semantic Web technologies and deep learning share the goal of creating intelligent artifacts that emulate human capacities such as reasoning, validating, and predicting. Both fields have been impacting data and knowledge analysis considerably as well as their associated abstract representations. Deep learning is a term used to refer to deep neural network algorithms that learn data representations by means of transformations with multiple processing layers. These architectures have frequently been applied in NLP to feature learning from raw data, such as part-of-speech-tagging, morphological tagging, language modeling, and so forth. Semantic Web technologies and knowledge representation, on the other hand, boost the re-use and sharing of knowledge in a structured and machine readable fashion. Semantic resources such as WikiData, Yago, BabelNet or DBpedia, as well as knowledge base construction and completion methods have been successfully applied to improved systems addressing semantically intensive tasks (e.g. Question Answering).

There are notable examples of contributions leveraging either deep neural architectures or distributed representations learned via deep neural networks in the broad area of Semantic Web technologies. These include, among others: (lightweight) ontology learning, ontology alignment, ontology annotation, joined relational and multi-modal knowledge representations, and ontology prediction. Ontologies, on the other hand, have been repeatedly utilized as background knowledge for machine learning tasks. As an example, there is a myriad of hybrid approaches for learning embeddings by jointly incorporating corpus-based evidence and semantic resources. This interplay between structured knowledge and corpus-based approaches has given way to knowledge-rich embeddings, which in turn have proven useful for tasks such as hypernym discovery, collocation discovery and classification, word sense disambiguation, joined relational and multi-modal knowledge representations and many others.

In this special issue, we invite submissions that illustrate how Semantic Web resources and technologies can benefit from an interaction with deep learning. At the same time, we are interested in submissions that show how knowledge representation can assist in deep learning tasks deployed in the field of NLP and how knowledge representation systems can build on top of deep learning results. Topics include, but are not limited to:

  • Structured knowledge in deep learning
    • learning and applying knowledge graph embeddings
    • applications of knowledge-rich embeddings
    • neural networks and logic rules
    • learning semantic similarity and encoding distances as knowledge graph
    • ontology-based text classification
    • multilingual resources for neural representations of linguistics
    • semantic role labeling
  • Deep reasoning and inferences
    • commonsense reasoning and vector space models
    • reasoning with deep learning methods
  • Learning knowledge representations with deep learning
    • word embeddings for ontology matching and alignment
    • deep learning and semantic web technologies for specialized domains
    • deep learning ontologies
    • deep learning models for learning knowledge representations from text
    • deep learning ontological annotations
  • Joint tasks
    • mining multilingual natural language for SPARQL queries
    • information retrieval and extraction with knowledge graphs and deep 
learning models
    • knowledge-based deep word sense disambiguation and entity linking
    • investigation of compatibilities and incompatibilities between deep learning and Semantic Web approaches
    • neural networks for learning Linked Data

Читать далее

Рубрика: Конференции | Добавить комментарий

Стали доступными обучающие выборки для дорожки по разрешению лексической многозначности

На конференции Диалог-2018 впервые пройдет дорожка по извлечению смыслов слов из текстов и разрешению лексической многозначности для русского языка. Участники смогут оценить качество работы современных моделей векторных представлений (word sense embeddings) для русского языка и других методов разрешения лексической многозначности. На данный момент уже доступны три обучающие выборки, и участники могут приступать к работе. Дорожка проводится при поддержке ACL SIGSLAV и ABBYY.

Подробная инструкция участника доступна на Github. Подробное описание задания, наборов данных и базовых методов решения задачи можно найти по адресу: Для участия в дорожке нужно заполнить форму.

Важные даты:

— Публикация обучающей выборки: 1 ноября, 2017.
— Публикация тестовой выборки: 15 декабря, 2017.
— Срок подачи моделей: 15 января, 2018.
— Объявление результатов дорожки: 1 февраля, 2018

Вопросы о дорожке можно направлять по адресу

Рубрика: Конференции, Лекции/Семинары | Добавить комментарий

Лекции Стефана Гриса в НИУ ВШЭ

Stefan Gries — профессор университета Калифорнии, Санта-Барбара, специалист по корпусной и квантитативной лингвистике.

Лекции будут прочитаны 29 и 30 ноября по теме «Quantitative methods in corpus linguistics». Они организованы Международной лабораторией языковой конвергенции совместно со Школой лингвистики НИУ ВШЭ.

29 ноября 18.10 — 19.30

30 ноября 18.10 — 19.30 и 19.40 — 21.00

Старая Басманная, 21/4, ауд. 501
Для получения пропуска зарегистрируйтесь, пожалуйста, по ссылке

Читать далее

Рубрика: Лекции/Семинары | Добавить комментарий

PhD and Post-doc openings (DeepSPIN project)

The University of Lisbon will soon be announcing 3 PhD student and 3 post-doctoral positions for my forthcoming ERC-funded project DeepSPIN («Deep Structured Prediction in Natural Language Processing»). This 5-year frontier research project will start in January 2018, and it also involves Unbabel, a vibrant start-up doing AI-powered crowd-sourced translation. Detailed information about the positions and how to apply can be found at:
(The application deadline is December 2017, but there’s flexibility about the dates depending on the candidate’s availability. For example, PhD students finishing next year can still apply to the post-doc position.)
Рубрика: Вакансии/Стажировки, Курсы/Образование/Постдоки | Добавить комментарий

Neural Machine Translation Implementations

Рубрика: Ресурсы/Софт | Добавить комментарий

Неоднозначность причастий в русском языке и способы ее разрешения

10-го ноября (в пятницу) в 16-00 состоится доклад Ульяны Петруниной на тему «Неоднозначность причастий в русском языке и способы ее разрешения на основе корпусной частотности их лемм». Доклад пройдет в конференц-зале (ауд. 401) ИЛИ РАН (Тучков пер., д. 9).
Приглашаются все желающие!
Ульяна Петрунина — аспирант отделения лингвистики Университета Тромсё, Норвегия. Входит в научные группы Giellatekno, CLEAR. Тема ее диссертации (2016–2020) — “Part of speech (participial) ambiguity in Russian and its resolution using weighted finite-state transducers and constraint grammar rules”. В данный момент Ульяна проходит стажировку в ИЛИ РАН.
Рубрика: Лекции/Семинары | Добавить комментарий