Три PhD позиции в университете Гронингена

Есть еще 5 дней до окончания приема заявок в PhD в университет Гронингена, Campus Fryslân (подробная инфорамация по ссылке www.academictransfer.com/39962)
We offer three four-year scholarships to complete a PhD in Culture, Language & Technology. Applications are invited from prospective PhD students with a good fit with the research specialisms and expertise of our academic staff. The PhD students will be enrolled in our Graduate School Campus Fryslân (GSCF), and may collaborate with senior researchers at the Fryske Akademy and the Department of Frisian Language and Culture. Additionally, they can benefit from affiliations at University Groningen Research Institutes like the Center for Language and Cognition or the Research School of Behavioral and Cognitive Neurosciences, as appropriate to the PhD project.

Candidates whose research interests relate to linguistic/cultural situations in the North of the Netherlands, particularly Fryslân, are encouraged to apply. Topics of interest follow.

● Language & society: Investigations of language contact and/or variation in a multilingual society, consequences of multilingualism for language change and language learning, comparisons of language attitudes specifically relating to Frisian accent
● Language & speech technologies: Studies of language and speech technologies that support or investigate a diversity of natural, multilingual interactions between people and the devices that surround them, with a keen eye on applications with real-world relevance, for example intoxication or pathology detection/recognition in multilingual discourse or multilingual prosody perception in the hearing-impaired;
● The future of multilingualism & minority languages: Research into consequences of globalization and migration for citizenship and expressions of linguistic and cultural identity in multilingual contexts, preferably also involving the North of the Netherlands.


The candidate should have the following qualifications:
● Master’s degree in relevant field (Linguistics, Anthropology, Cultural Studies, or similar) awarded near September 1st 2017, at the latest. Applied research experience in the private sector is an asset
● excellent record of undergraduate and graduate study
● strong motivation to complete a PhD dissertation in four years
● excellent command of spoken and written English. Additionally, Dutch, Frisian and/or German skills (or the ability to learn these languages quickly) is an asset.

Читать далее

Рубрика: Курсы/Образование/Постдоки | Добавить комментарий

The First Workshop on Subword and Character LEvel Models in NLP (SCLeM)

To be held at EMNLP 2017 in Copenhagen on September 7, 2017

Website: https://sites.google.com/view/sclem2017


Submission deadline: June 2, 2017
Notification: June 30, 2017
Camera ready: July 14, 2017
Workshop: September 7, 2017


Kyunghyun Cho, NYU
Karen Livescu, TTIC
Tomas Mikolov, Facebook
Noah Smith, Univ of Washington


Neural weighted finite-state machines, Ryan Cotterell, JHU


Traditional NLP starts with a hand-engineered layer of representation,
the level of tokens or words.  A tokenization component first breaks
up the text into units using manually designed rules. Tokens are then
processed by components such as word segmentation, morphological
analysis and multiword recognition.  The heterogeneity of these
components makes it hard to create integrated models of both structure
within tokens (e.g., morphology) and structure across multiple tokens
(e.g., multi-word expressions). This approach can perform poorly (i)
for morphologically rich languages, (ii) for noisy text, (iii) for
languages in which the recognition of words is difficult and (iv) for
adaptation to new domains; and (v) it can impede the optimization of
preprocessing in end-to-end learning.

The workshop provides a forum for discussing recent advances as well
as future directions on sub-word and character-level natural language
processing and representation learning that address these problems.


09:00-09:10  Welcome
09:10-09:50  Invited talk 1
09:50-10:30  Invited talk 2
10:30-11:00  Coffee break
11:00-11:40  Invited tutorial talk
11:40-12:10  Best paper presentations
12:10-14:00  Poster session & Lunch break
14:00-14:40  Invited talk 3
14:40-15:40  Poster session & Coffee break
15:40-16:20  Invited talk 4
16:20-17:30  Panel discussion
17:30-17:45  Closing remarks


— tokenization-free models
— character-level machine translation
— character-ngram information retrieval
— transfer learning for character-level models
— models of within-token and cross-token structure
— NL generation (of words not seen in training etc)
— out of vocabulary words
— morphology & segmentation
— relationship b/w morphology & character-level models
— stemming and lemmatization
— inflection generation
— orthographic productivity
— form-meaning representations
— true end-to-end learning
— spelling correction
— efficient and scalable character-level models
Читать далее

Рубрика: Конференции | Добавить комментарий

Summer Neurolinguistics School 2017: Brain Stimulation in Language Research and Therapy/Russia

Host Institution: National Research University Higher School of Economics
Coordinating Institution: National Research University Higher School of Economics
Website: https://www.hse.ru/neuroling/summer_school_2017

Dates: 22-Jun-2017 — 24-Jun-2017
Location: Moscow, Russia

Focus: The topic of the fourth annual Summer Neurolinguistics School is Brain Stimulation in Language Research and Therapy. The school will be devoted to brain stimulation methods and their applications in neurolinguistic research and speech-language therapy.
Minimum Education Level: No Minimum

The Neurolinguistics Laboratory at the National Research University Higher School of Economics (HSE) invites you to join us in Moscow for our fourth annual Summer Neurolinguistics School. This year, the topic is Brain Stimulation in Language Research and Therapy. The event will take place in Moscow, Russia, June 22–24, 2017. The purpose of the school is to serve both as an educational event for students entering the field and as an academic environment where researchers and clinicians can exchange ideas and discuss the latest achievements in the field. This year the school will be devoted to various brain stimulation methods and their applications in neurolinguistic research and in language therapy. The school program includes lectures by such world-renowned researchers as Nina Dronkers (University of California), Roelien Bastiaanse (University of Groningen), and Dirk-Bart den Ouden (University of South Carolina).

What to expect?
Читать далее

Рубрика: Курсы/Образование/Постдоки | Добавить комментарий

The First Summer School on Statistical Methods for Linguistics and Psychology, Potsdam, Germany, August 28th to September 1st 2017

We are happy to announce The First Summer School on Statistical
Methods for Linguistics and Psychology, to be held in Potsdam, Germany
from August 28th to September 1st 2017. This first edition of the
summer school will provide an introductory overview of frequentist and
Bayesian statistics for linguists and psychologists. We will cover the
necessary theoretical background for both statistical frameworks, and
participants will get hands-on practice in learning to analyze real
data sets.

Invited speaker (29th August): Prof. Dr. Richard McElreath, Director,
Department of Human Behavior, Ecology, and Culture Max Planck
Institute for Evolutionary Anthropology

How to apply:


Рубрика: Курсы/Образование/Постдоки | Добавить комментарий

Coursera: курсы по компьютерной лингвистике

Introduction to Natural Language Processing

Text Retrieval and Search Engines

Text Mining and Analytics

Hands-on Text Mining and Analytics

Applied Text Mining in Python


Рубрика: Лекции/Семинары, Ресурсы/Софт | Добавить комментарий


=================== EUROLAN-2017 ====================


The 13th in the series of EUROLAN Schools

10 – 17 September 2017, Constanța, Romania
http://eurolan.info.uaic.ro/2017/ [1]


Biomedical Text Mining (BioNLP) applies natural language processing (NLP)
techniques to identify and extract information from scientific publications
in biology, medicine, and chemistry, in order to discover novel knowledge
that can contribute to biomedical research.The growth of BioNLP over the past
fifteen years is due in large part to the availability of web-based
publication databases such as PubMed and Web of Science coupled with
increasing access to anonymized electronic medical/health records. The large
size of the biomedical literature and its rapid growth in recent years make
literature search and information access a demanding task. Health-care
professionals in the clinical domain face a similar problem of information
explosion when dealing with the ever-increasing body of available
medical/health records in electronic form. Beyond merely identifying texts
relevant to a particular interest, BioNLP applies sophisticated NLP
information extraction (IE) technologies (e.g., event extraction or
entity-relation extraction) to identify and analyze text segments to produce
information about, or even models, of phenomena such as drug or protein
interactions, gene relations, temporal relations in clinical records,
biological processes, etc. Overall, the application of automatic NLP
techniques to unstructured text in scientific literature and medical records
enables life scientists to both find and exploit this data without the
significant effort of manual searching and researching.

EUROLAN-2017 has engaged several well-known researchers in the fields of
BioNLP and NLP to provide a comprehensive overview of language processing
models and techniques applicable to the biomedical domain, ranging from an
introduction to fundamental NLP technologies to the study of use cases and
exploitation of available tools and frameworks that support BioNLP. Each
tutorial is accompanied by one or two hands-on sessions, in which
participants will use text mining tools to explore and exploit several
varieties of biomedical language resources, including cloud-based
repositories of scientific publications, annotated biomedical corpora,
databases and ontologies of biomedical terms, etc. The topics covered in the
tutorials and hands-on sessions include:

• mining biomedical literature
• entity identification and normalization
• conceptual graphs extracted from medical texts
• annotation of semantic content, with applications in medicine and biology
• medical search engines
• deep learning for bioinformatics
• biomedical question/answering
• clinical data repositories
• big data and cloud computing in relation with biomedical textual data
• clinical relationships
• medical topic modeling
• medical language systems
• clinical text analysis
• text summarization in the biomedical domain
• event-based text mining for biology and related fields
• event extraction in medical texts

Invited Lecturers

• Mihaela Breabăn – “Alexandru Ioan Cuza” University of Iași
• Kevin Cohen – University of Colorado at Boulder (USA)
• Noa Patricia Cruz Diaz – Virgen del Rocio University Hospital
• Eric Gaussier – University Grenoble Alps (France)
• Nancy Ide – Vassar College (USA)
• Pierre Zweigenbaum – LIMSI, CNRS, Université Paris-Saclay, Orsay
Читать далее

Рубрика: Курсы/Образование/Постдоки | Добавить комментарий

Natural Language Processing meets Journalism — workshop at EMNLP 2017


EMNLP 2017 Workshop

September 7, Copenhagen, Denmark

http://nlpj2017.fbk.eu [1]

Call for Papers

With the advent of the digital era, journalism faces what seems to be a major
change in its history — data processing. While much journalistic effort has
been (and still is) dedicated to information gathering, now a great deal of
information is readily available ñ but is dispersed in a large quantity of
data. Thus processing a continuous and very large flow of data has become a
central challenge in today's journalism.

With the recognition of this challenge, it has become widely accepted that
data-driven journalism is the future. Tools which perform big data mining in
order to pick out and link together what is interesting from various multi
media resources are needed; these tools will be used as commonly as
typewriters once were. Their scope is well beyond data classification: They
need to construct sense and structure out of the never- ending flow of
reported facts, ascertaining what is important and relevant. They need to be
able to detect what is behind the text, what authors' intentions are, what
opinions are expressed and how, whose propagandistic goal an article might
serve, etc. What's more, they need to go beyond an intelligent search engine:
They need to be picky and savvy, just like good journalists, in order to help
people see what is really going on. It must be added that we have already
been subjected to a large scale invasion of seemingly new techniques: fake
news, alternative facts etc. For better or for worse, this is indeed the
reality we must make decisions in, and we must developed tools for handling
it rightly. That is, natural language processing meeting journalism is a
crucial process that has to be instantiated on each tablet , phone or monitor
on which a piece of news is displayed — for reading or writing.

At this workshop we anticipate papers that report on state-of-the-art
inquiries into the analysis and use of large to huge news corpora. A news
corpus is generally understood as scoping over newspapers, social networks,
the web, etc. The papers should present computational techniques able to
manage a huge quantity of information and/or to perform deep analyses that
extend over actual state of the art. We welcome reports on the recent
progress on overcoming the bottlenecks in open domain relation extraction,
paraphrasing, textual entailments and semantic similarity, and on their
results in analyzing news content. However, we are also greatly interested in
technologies for enhancing the communicative function of language in this
context more generally, including in computational humor, NLP creativity for
advertising, plagiarism, fake news etc.
Читать далее

Рубрика: Конференции | Добавить комментарий

AIST 2017 : The 6th International Conference on Analysis of Images, Social Networks, and Texts


The conference is intended for computer scientists and practitioners whose research interests are related to data science. The previous conferences in 2012-2016 attracted a significant number of students, researchers, academics and engineers working on analysis of images, texts, and social networks. The broad scope of AIST makes it an event where researchers from different domains, such as image and text processing, exploiting various data analysis techniques, can meet and exchange ideas. The conference allows specialists from different fields to meet each other, present their work, and discuss both theoretical and practical aspects of their data analysis problems. Another important aim of the conference is to stimulate scientists and people from the industry to benefit from the knowledge exchange and identify possible grounds for fruitful collaboration.


The scope of the conference includes the following topics:

— Social Network Analysis
— Natural Language Processing
— Recommender systems and collaborative technologies
— Analytics for geoinformation systems
— Analysis of images and video
— Discovering and analyzing processes using event data
— Game analytics
— Core Data Mining and Machine Learning techniques
— Semantic Web and Ontologies
— Educational Data Mining
— ML & DM for Economics and Social Sciences


Similarly to the previous years, the conference proceedings will be published in the Springer’s Communications in Computer and Information Science (CCIS) or in Lecture Notes in Artificial Intelligence (LNAI) series. Proceeding of the AIST 2015 conference can be found at http://www.springer.com/us/book/9783319261225 and proceedings of the AIST 2016 are available at http://www.springer.com/us/book/9783319529196.

Publication of revised selected papers is performed as a post-proceedings in the Springer’s Lecture Notes in Computer Science (LNCS) series.


The 6th conference on Analysis of Images, Social Networks, and Texts will take place in Moscow, Russia from Thursday, 27th through Saturday, 29th of July 2017.


Submission of abstracts: April 30, 2017
Deadline for papers: May 7, 2017
Notification of Acceptance: June 7, 2017
The Conference: July 27 — 29, 2017


Track 1. General topics of data analysis
Sergey Kuznetsov (Higher School of Economics, Moscow)
Amedeo Napoli (LORIA, Nancy)

Track 2. Natural language processing
Natalia Loukachevitch (Moscow State Lomonosov University)
Alexander Panchenko (University of Hamburg, Hamburg)

Track 3. Social network analysis
Stanley Wasserman (Indiana University, Bloomington)

Track 4. Analysis of images and video
Victor Lempitsky (Skolkovo Institute of Science and Technology, Moscow)
Andrey Savchenko (Higher School of Economics, Nizhny Novgorod)

Track 5. Optimization problems on graphs and network structures
Panos M. Pardalos (University of Florida)
Mikhail Khachay (IMM UB RAS & Ural Federal University)

Track 6. Analysis of dynamic behavior through event data
Wil van der Aalst (Eindhoven University of Technology)
Irina Lomazova (Higher School of Economics, Moscow)

Рубрика: Без рубрики | Добавить комментарий

Call for Papers: Special Issue of the journal Computational Linguistics on Language in Social Media

Special Issue of the journal Computational Linguistics on:
Language in Social Media: Exploiting discourse and other contextual

*** Deadline 15th October 2017 (11:59 pm PST) ***

For more details see: http://www.sfu.ca/~mtaboada/coli-si.html [1]

**Guest editors**
Farah Benamara — IRIT, Toulouse University (benamara@irit.fr [2])
Diana Inkpen — University of Ottawa  (diana.inkpen@uottawa.ca [3])
Maite Taboada — Simon Fraser University (mtaboada@sfu.ca [4])

socialmedia.coli@gmail.com [5]

**Call for papers**
Social media content (SMC) is changing the way people interact with each
other and share information, personal messages, and opinions about
situations, objects and past experiences. This content (ranging from blogs,
fora, reviews, and various social networking sites) has specific
characteristics that are often referred as the five V's: volume, variety,
velocity, veracity, and value. Most of them are short online conversational
posts or comments often accompanied by non-linguistic contextual information,
including metadata such as the social network of each user and their
interactions with other users. Exploiting the context of a word or a sentence
increases the amount of information we can get from it and enables novel
applications. Such rich contextual information, however, makes natural
language processing (NLP) of SMC a challenging research task. Indeed, simply
applying traditional text mining tools is clearly sub-optimal, as such
methods take into account neither the interactive dimension nor the
particular nature of this data, which shares properties of both spoken and
written language.

Most research on NLP for social media focuses primarily on content-based
processing of the linguistic information, using lexical semantics (e.g.,
discovering new word senses or multiword expressions) or semantic analysis
(opinion extraction, irony detection, event and topic detection, geo-location
detection) (Londhe et al., 2016; Aiello et al., 2013; Inkpen et al., 2015;
Ghosh et al., 2015). Other research explores the interactions between content
and extra-linguistic or extra-textual features like time, place, author
profiles, demographic information, conversation thread and network structure,
showing that combining linguistic data with network and/or user context
improves performance over a baseline that uses only textual information (West
et al., 2014; Karoui et al., 2015; Volkova et al., 2014; Ren et al., 2016).

We expect that papers in this special issue will contribute to a deeper
understanding of these interactions from a new perspective of discourse
interpretation. We believe that we are entering a new age of mining social
media data, one that extracts information not just from individual words,
phrases and tags, but also uses information from discourse and the wider
context. Most of the “big data” revolution in social media analysis has
examined words in isolation, a “bag-of-words” approach. We believe it is
possible to investigate big data, and social media data in general, by
exploiting contextual information.

We encourage submission of papers that address deep issues in linguistics,
computational linguistics and social science. In particular, our focus is on
the exploitation of contextual information within the text (discourse,
argumentation chains) and extra-linguistic information (social network,
demographic information, geo-location) to improve NLP applications and help
building pragmatic-based NLP systems. The special issue aims also to bring
researchers that propose new solutions for processing SMC  in various
use-cases including sentiment analysis, detection of offensive content, and
intention detection. These solutions need to be reliable enough in order to
prove their effectiveness against shallow bag-of-words approaches or
content-based approaches alone.
Читать далее

Рубрика: Конференции | Добавить комментарий

Germeval Task 2017 Shared Task on Aspect-based Sentiment in Social Media Customer Feedback


In the connected, modern world, customer feedback is a valuable source for insights on the quality of products or services. This feedback allows other customers to benefit from the experiences of others and enables businesses to react on requests, complaints or recommendations. However, the more people use a product or service, the more feedback is generated, which results in the major challenge of analyzing huge amounts of feedback in an efficient, but still meaningful way.

Thus, we propose a shared task on automatically analyzing customer reviews about “Deutsche Bahn” — the german public train operator with about two billion passengers each year.


 “RT @XXX: Da hört jemand in der Bahn so laut ‘700 Main Street’ durch seine Kopfhörer, dass ich mithören kann. :( :( :(“

As shown in the example, insights from reviews can be derived on different granularities. The review contains a general evaluation of the travel (The customer disliked the travel). Furthermore, the review evaluates a dedicated aspect of the train travel (“laut” → customer did not like the noise level).

Consequently, we frame the task as aspect-based sentiment analysis with four sub tasks:


Читать далее

Рубрика: Конференции, Ресурсы/Софт | Добавить комментарий