Digital Humanities in the Nordic Countries calls for submissions for its 2018 conference in Helsinki, Finland, 7–9 March 2018

In 2018, the conference seeks to extend the scope of digital humanities research covered, both into new areas, as well as beyond the Nordic and Baltic countries. In pursuit of this, in addition to the abstracts familiar from humanities traditions, we also adopt a call for publication ready texts as is the tradition in computer science conferences. Therefore, we accept the following types of submissions:

  1. Publication ready texts of length appropriate to the topic. Accepted papers will be submitted to the CEUR-WS proceedings series for publication in a citable form. Layout for the papers is not absolutely mandated, but we suggest you use the Springer LNCS templates to ensure a uniform look for the proceedings.
    1. Long paper: 8-12 pages, presented in 20 min plus 10 min for Q&A
    2. Short paper: 4-8 pages, presented in 10 min plus 5 min for Q&A
    3. Poster/demo: 2-4 pages, presented as an A1 academic poster in a poster session.
  2. Abstracts of a maximum of 2000 words. Proposals are expected to indicate a preference between a) long, b) short, or c) poster/demo format for presentation. Approved abstracts will be published in a book of abstracts on the conference website.

Submissions to the conference are now open at ConfTool!

Im­port­ant dates

The call for proposals opened on 28 August 2017, and the deadline for submitting proposals is 25 October 2017. Presenters will be notified of acceptance by 8 January 2018. For papers accepted into the citable proceedings, there is an additional deadline of 5 February 2018 for producing a final version of your paper that takes into account the comments made by the reviewers.

This year, the conference welcomes in particular work related to the following themes:


While the number of researchers describing themselves as digital historians is increasing, computational approaches to history have rarely captured the attention of those without innate interest in digital humanities. To address this, we particularly invite presentations of historical research whose use of digital methods advances the overall methodological basis of the field.

Cultural Heritage

Libraries, galleries, archives and museums are making vast amounts of cultural heritage openly digitally available. However, tapping into these resources for research requires cultivating co-operation and trust between scholars and heritage institutions, due to the cultural, institutional, legal and technical boundaries crossed. We invite proposals describing such co-operation – examples of great resources for cultural heritage scholarship, of problems solved using such data, as well as e.g. intellectual property rights issues.


Humanities perspectives on games are an established part of the game studies community. Yet their relationship with digital humanities remains undefined. Digitality and games, digital methods and games, games as digital methods, and so on are all areas available for research. We invite proposals that address high-level game concepts like “fun”, “immersion”, “design”, “interactivity”, etc positioned as points of contact with the digital.


We also invite proposals in the broad category of ”Future”. Accepted proposals will still fit in the overall context of the conference and highlight new perspectives to the digital humanities. Submissions may range from applications of data science to humanities research to work on human-machine interaction and ecological digital humanities. We also welcome reflections on the future of the digital humanities, as well as the societal impact of the humanities.

Finally, the overarching theme this year is Open Science. This pragmatic concept emphasises the role of transparent and reproducible research practices, open dissemination of results, and new forms of collaboration, all greatly facilitated by digitalisation. All proposals are invited to reflect on the benefits, challenges, and prospects of open science for their own research.

Читать далее

Рубрика: Конференции | Добавить комментарий

говорящая Википедия

We are proud to announce the availability of the Spoken Wikipedia
Corpora. They consist of time-aligned spoken Wikipedia articles for
English, German, and Dutch, totaling more than 1,000h of audio by
numerous speakers about a wide variety of topics.

The Spoken Wikipedia Project is a community effort by volunteers as
part of the Wikipedia[1] to record spoken versions of Wikipedia articles.

The corpora are freely available under a CC BY-SA license.

The annotations perfectly retain the original text and each part can
be traced back to its original place in the Wikipedia HTML.  This
enables research with respect to spoken hypertext and its markup.

In addition to per-word alignments, we also provide phoneme-level
alignments for both German and English, generated by MAUS.

DE    EN    NL
#articles    1010  1314  3073
hours audio  386h  395h  224h
aligned  249h  182h   79h
ph-aligned  129h   77h    —

This is, as far as we know, the largest corpus of freely-available
aligned speech for both German and Dutch and the largest
freely-availble corpus of aligned factual speech for English.

To download the corpora and obtain more information, please visit

There you can also obtain the pipeline to automatically download and
align a Spoken Wikipedia Corpus yourself.  We also provide a template
to adapt the software for new languages.  Note, however, that the
alignment process takes a significant amount of time.

If you use this resource in your research, please cite
Arne Köhn, Florian Stegen, Timo Baumann. 2016.
«Mining the Spoken Wikipedia for Speech Data and Beyond».
in Proceedings of LREC 2016.
or a later publication.

Рубрика: Ресурсы/Софт | 1 комментарий

shared task on Word Sense Induction and Disambiguation for the Russian



Word Sense Induction (WSI) is the process of automatic identification of the word senses. While evaluation of various sense induction and disambiguation approaches was performed in the past for the Western European languages, e.g. English, French, and German, no systematic evaluation of WSI for Slavic languages ( is available at the moment. This shared task makes a first step towards bridging this gap by setting up a shared task on one Slavic language: The goal of this task is to compare sense induction and disambiguation systems for the Russian language. Many Slavic languages still do not have broad coverage lexical resources available in English, such as WordNet, which provide a comprehensive inventory of senses. Therefore, word sense induction methods investigated in this shared task can be of great value to enable semantic processing of Slavic languages.


If you are interested in participation, please register using this form by the 15th of November: The full description of the shared task is available at

Task Description

The shared task is structurally similar to prior WSI tasks for the English language, such as SemEval 2007 WSI ( and SemEval 2010 WSI&D ( tasks.

We use the “lexical sample” settings. Namely, we provide the participants with the set of contexts representing examples of ambiguous words, like the word “bank” in “In geography, the word **bank** generally refers to the land alongside a body of water.” For each context, a participant needs to disambiguate one target word. Note that, we do not provide any sense inventory: the participant can assign sense identifiers of their choice to a context, e.g. “bank#1” or “bank (area)”.


The task will feature two tracks. In the “knowledge-free” track participants need to induce a sense inventory from a text corpus of their own. The participants need to use it to assign each context with a sense identifier according to this induced inventory. In the “knowledge-rich” track participants are free to use a sense inventory from an existing dictionary to disambiguate the target words (yet the use of the gold standard inventory is prohibited). The advantage of our setting is that virtually any existing word sense disambiguation approach can be used within the framework of our shared task starting from unsupervised sense embeddings to the graph-based methods that rely on lexical knowledge bases, such as WordNet.


We will provide training datasets, which can be used for development of the models. Later, test datasets will be released: The participants will need to use the developed models to disambiguate the test sentences and submit their final results to the organisers. Training and testing datasets will use the same corpora and annotations approaches, but the target words will be different for training and testing datasets.

Quality Measure

Similarly to SemEval 2010 Task 14 WSI&D, we use a gold standard, where each ambiguous target word is provided with a set of instances, i.e., the contexts containing the word. Each instance is manually annotated with the single sense identifier according to a predefined sense inventory. Each participating system assigns the sense labels for these ambiguous words, which can be viewed as a clustering of instances, according to sense labels. To evaluate a system, the system’s labelling of contexts is compared to the gold standard labelling. We use the Adjusted Rand Index (ARI) as the quantitative measure of the clustering.

Baseline Systems

We will offer a simple open source baseline system that will demonstrate the task, the input and output data formats as well as the used quality measure. For the knowledge-free track, we particularly encourage participation of various systems based on unsupervised word sense embeddings, e.g. AdaGram. For the knowledge-rich track, word sense embeddings based on inventories based on lexical resources, e.g. AutoExtend, can be obtained on the basis of lexical resources such as RuThes ( and RuWordNet (

Dissemination of the Results

The results of the shared task will be disseminated and discussed at the 24th International Conference on Computational Linguistics and Intellectual Technologies “Dialogue 2018”: Training and the test datasets will be published online to foster future research and developments.

Important Dates

  • First Call for Participation: October 15, 2017.

  • Release of the Training Data: November 1, 2017.

  • Release of the Test Data: December 15, 2018.

  • Submission of the Results: January 15, 2018.

  • Results of the Shared Task: February 1, 2018.


  • Alexander Panchenko, University of Hamburg

  • Dmitry Ustalov, Krasovskii Institute of Mathematics and Mechanics

  • Konstantin Lopukhin, Scrapinghub Inc.

  • Anastasiya Lopukhina, Neurolinguistics Laboratory, National Research University Higher School of Economics & Russian Language Institute of the Russian Academy of Sciences

  • Nikolay Arefyev, Moscow State University & Samsung Research

  • Natalia Loukachevitch, Moscow State University

Рубрика: Конференции, Ресурсы/Софт | Добавить комментарий

PhD position in Deep Learning for News data

University of Stavanger in Norway invites applications for a fully funded PhD position in “Smart Technologies” at Department of Computer Science and Electrical Engineering.

The PhD is in the area of machine learning and deep learning for text/news mining, tracking and summarizing real-time events and social media. The goal is to develop machine learning and deep neural models for generating holistic summaries of multi-source events for smart technology applications.

The event data can be from variety sources such as news, social media as well as city monitoring sensors such as traffic, weather forecast and pollution monitoring etc. This massive-scale data is only beneficial when application-specific actionable knowledge is extracted from them. For example, summarizing people’s reactions on the topics related to search and rescue during natural disasters along with correlated traffic and weather situation may guide the emergency services in smart cities to manage the distribution of resources and supplies more effectively.

Deep Neural Networks (DNNs) have recently been very successful for performing several difficult learning tasks with high precision for applications in several domains such as computer vision, natural language processing and machine translation etc. Although, DNNs work well when large-scale labeled training data is available, their application to generate target-specific holistic event summaries has never been tried before. Using DNNs to learn from input sequences in the form of event streams to general summary sequences as output is a challenging open problem.


What we offer in a nutshell:
1. A strong research environment with supervision from experienced faculty.
2. Opportunities to collaborate and for research stays with our renowned collaborators worldwide, in Germany, Denmark and Netherlands.
3. Well-paid PhD position, in a country which has been ranked by the UN as having the highest standard of living in the world, which is known for its unique scenic beauty. Norway was also named as the world’s happiest country recently.
4. We use English for research and communication. Although opportunity to learn Norwegian language will be provided free of cost.

Suitable Background and Requirements:

1. Applicants must have a degree in Computer Science, or in a related study, with excellent results. They must also be able to demonstrate interest in scientific research. The evaluation considers many aspects of excellence, as well as the personal drive and organizational skills. The ideal candidate for the position will have strong background in distributed computing.
2. You may apply if you have not yet completed your degree, but expect to do so before the position starts.
3. Experience or publications related to any of the following areas is a bonus: Information Retrieval, text mining, machine learning.

For more information, please contact Vinay Setty ( or Tom Ryen (

For detailed information about the PhD position and the application process, please see:

Deadline for the application:  11th of November, 2017

Рубрика: Вакансии/Стажировки, Курсы/Образование/Постдоки | Добавить комментарий

CFP: 2nd International Conference on Sociolinguistics — 6-8 September 2018 — Budapest

Insights from Superdiversity, Complexity and Multimodality

6-8 September 2018, Eötvös Loránd University, Budapest


We are delighted to inform you that the 2nd International Conference on Sociolinguistics (ICS-2)
will take place 6-8 September 2018 at Eötvös Loránd University, Budapest, Hungary.

We invite contributions which address current sociolinguistic issues from a great variety of
 including superdiversity, complexity and multimodality.
Accordingly, possible topics include but are not limited to the following:

— Language variation and change
— Language and mobility
— Language as a local practice
— Multilingualism/Polylanguaging and superdiversity
— English as a Lingua Franca in the polylingual world
— Language and class
— Language and gender
— Linguistic landscape
— Multimodality
— Identity practices in the social media
— Style
— Language in the media
— Language and advertising
— Language and economy
— Language ideologies
— Dialectology
— New developments in pidgin and creole linguistics
— Language policy, language planning
— Educational linguistics
— Linguisitics
— Sociolinguistics of language education
— Forensic sociolinguistics
— Critical Discourse Analysis
— Cognitive sociolinguistics

Two peer-reviewed selected papers volumes are planned to be published with an international publisher.
The language of the conference is English.

We invite (1) paper presentations for the general session, (2) posters for the poster session and
(3) entire panels

Paper presentations:

Paper presentations will be allocated 20 minutes plus 10 minutes for questions.


Poster presentations will be allocated their own time slot during the conference.


Panel convenors are asked to separately (1) collect 6-8 panel paper abstracts, (2) find a discussant and (3) write a panel introduction before (4) submitting the full panel package to ICS-2.
Panel papers will be allocated 20 minutes, plus 10 minutes for questions per paper either immediately after each paper or at the end of the panel with an all-in discussion.

General session paper, poster session paper and panel paper abstracts of max. 300 words in MS Word including references should be submitted to the e-mail address by 15 January 2018.

Formatting: 12 Times New Roman, margins at 2.5 cm.
The title of MS Word file should include (first) author’s first name, surname and the exact title of paper in this order.
The abstract should include (1) the name of every author, (2) the affiliation of every author and (3) the paper title.

Notifications of acceptance will be communicated by 15 February 2018.

Рубрика: Конференции | Добавить комментарий

вакансия лингвист-разработчик в СПб

В компании Just AI ( в связи с расширением открывается вакансия на позицию лингвист-разработчик.
Нашим основным технологическим продуктом является платформа для создания чат-ботов.
Направления нашей деятельности: автоматизация служб клиентской поддержки, conversational commerce, обеспечение диалога на естественном языке с роботами и умными домами.

· Составление и проработка сценариев, тем, структуры диалогов для чат-ботов крупных компаний.
· Написание небольших скриптов для обработки данных.
· Анализ логов диалогов «клиент-оператор» и «клиент-бот», выделение структуры и разметка.

· Письменная, стилистическая‚ орфографическая, техническая грамотность, внимательность к деталям.
· Навыки программирования на любом из языков (предпочтительно — Python, JavaScript).

· Очень приветствуется практический опыт работы с системами обработки естественного языка.
· Английский язык
· Знание современных методов компьютерной лингвистики, основных задач NLP и методов их решения.
· Опыт работы с системами контроля версий Git или Mercurial.
· Наличие профильного образования по обработке естественного языка.
· Опыт работы с NLP: MyStem, NLTK.

· Работа в офисе у станции метро «Спортивная».
· Гибкое начало рабочего дня, 8-часовой рабочий день (часы присутствия с 12 до 17).
· Официальное оформление с 1го рабочего дня, «белая» заработная плата.
· Профессиональное развитие и карьерный рост.

Просим отправлять резюме на адрес:

Рубрика: Вакансии/Стажировки | Добавить комментарий

FinMT 2017 — A Workshop on Machine Translation

  • Date: 1 November, 2017
  • Place: University of Helsinki, Helsinki/Finland
  • Participation: Please, fill in the on-line registration form (participation is free). Registration deadline: October 27, 2017!

Invited keynotes by

The goal of this workshop is to bring together people with interest in machine translation for morphologically-rich languages and Finnish in particular. Morphological complexity and word order freedom are difficult challenges in current MT models and we would like to accelerate the development of systems that can handle languages like Finnish in a more appropriate way especially as the target language. We hope that this event will create new ideas and fosters collaborations leading to substantial improvements in MT for such languages. The following topics are of special interest for the workshop:

  • Neural MT for morphologically-rich languages
  • Tools and resources for Finnish (and other morphologically-rich languages)
  • MT for low-resource languages
  • Evaluation of highly-inflecting languages
  • Long-distance dependencies even across sentence boundaries

Participation is free but requires registration.

Tentative Program

Читать далее

Рубрика: Конференции | Добавить комментарий

Hackathon on Finnish news

If you have an interest in analysis of content in media and AI applications to text big-data, please consider joining this upcoming ULTRAHACK event.  Co-sponsored and co-organized by Sanoma Oy, and other media companies.  It is intended to attract students, researchers—individuals as well as teams—and young start-ups, who wish to try their hand at real-world media data and challenges.

Excellent for people interested in AI / data science, with a tilt toward text and big media data streams, as well as modeling users’/readers’ behavior and preferences.

For more information, please see the challenge site:

or more generally:

(There are travel grants for those living outside Helsinki)

The winner(s) will receive a cash sum and global fame.  More importantly, interesting solutions will have the potential to be taken up for immediate testing in production.

Рубрика: Ресурсы/Софт | Добавить комментарий

NLP Positions at Grammarly

Effective communication is hard. It requires talent, skill, and a lot of
effort. Enter Grammarly, the first widely adopted AI communication
assistant helping people with the substance and impact of their writing.
Grammarly helps millions of people make their written communication clear,
mistake-free, and effective.   Grammarly is changing the way the world
communicates?enabling people to write exactly what they mean, and be fully

To help us reach that goal, we?re looking for bright, experienced
scientists, engineers and linguists with a keen interest and background in
Natural Language Processing, Machine Learning, and Deep Learning to create
the next generation of writing and communication assistance tools.
Our interests include not only building better grammatical error correction
but also expanding to other aspects of writing feedback such as style,
organization, structure, argumentation, and personalization, to name a
At Grammarly, our impact comes from a strong culture with highly engaged,
highly motivated team members. We hire exceptional people and reward them
with trust, autonomy, mentorship, and the freedom to grow into their roles.
We?re a passionate, growing team on a mission to improve lives by improving
communication. If you?re up for the challenge, we would love to meet you!

We have several exciting positions open in our NYC, SF and Kiev offices:

* 2018 PhD Internship (NYC):

* Lead / Senior Research Engineer (NYC, SF):

* Research Engineer (NYC, SF):

* Research Engineer (Kiev):

* Senior Computational Linguist (NYC):

* Senior Computational Linguist (Kiev):

* Linguist (NYC):


Рубрика: Вакансии/Стажировки | Добавить комментарий

Fully funded 3-year PhD position in Information Retrieval


The University of Copenhagen is offering two fully-funded PhD positions at Information Retrieval Lab of the Department of Computer Science. The scholarship requires a master’s degree in computer science or a field providing equivalent qualifications (see above), at the time of taking up the position.

The Information Retrieval Lab at the University of Copenhagen offers a research environment with much freedom: to a large extent, you shape the research you work on and the ideas you pursue.

The University of Copenhagen is the oldest university and research institution in Denmark. Founded in 1479 as a studium generale, it is the second oldest institution of higher education in Scandinavia after Uppsala University.

Course Level: Positions are available for pursuing PhD programme.

Study Subject: The Information Retrieval Lab has two open PhD positions

PhD position 1 seeks to investigate the level of uncertainty in probability estimates of ranking models.
PhD position 2 seeks to investigate compositionality in text semantics.


Scholarship Award: Terms of appointment and payment are in accordance with the agreement between the Danish Ministry of Finance and The Danish Confederation of Professional Associations on Academics in the State.


University of Copenhagen



Master degree in Computer Science, Mathematics or related field
Strong programming skills
Strong mathematical background
Additionally, candidates need to stand out in at least one of the following key criteria of excellence used for the assessment:

Language requirements:

Proficiency in spoken and written English

Educational level:

Master Degree

How to apply:

About University of Copenhagen

Читать далее

Рубрика: Вакансии/Стажировки | Добавить комментарий