Shared Task on Native Language Identification Inbox x

* Website: [1]
* Training data was released today!


We are excited to organize a new shared task on Native Language
Identification (NLI) which will take place at the BEA12 Workshop, co-located
with EMNLP in Copenhagen, September 08, 2017.

NLI is the task of identifying the native language (L1) of a writer based
solely on a sample of their writing or speech. The task is typically framed
as a classification problem where the set of L1s is known a priori. Most work
has focused on identifying the native language of writers learning English as
a second language. Two previous shared tasks on NLI have been organized in
which the task was to identify the native language of non-native speakers of
English-based on essays and spoken responses they provided during a
standardized assessment of academic English proficiency. The first shared
task was based on the essays only and was also held with the BEA workshop in
2013. It was very successful with 29 teams competing, making it one of the
largest shared tasks that year. Three years later, the Computational
Paralinguistics Challenge at Interspeech 2016 hosted a sub-challenge on
identifying the native language based solely on the spoken responses.

This year's shared task combines the inputs from the two previous tasks.
There will be three tracks: NLI on the essay only, NLI on the spoken response
only (based on a transcription of the response, not the audio), and NLI using
both responses from a test taker. This distinction will make for a more
challenging shared task while building on the methods and results from the
previous two shared tasks.   We promise this shared task will be fun for you
and your colleagues, as well as your whole family.


Educational Testing Service (ETS) is releasing 13,200 English essays and
orthographic transcriptions of 13,200 spoken responses from the TOEFL iBT®
assessment for the 2017 NLI Shared Task with the goal of helping researchers
advance state-of-the-art in the field of NLI.  In addition to the
orthographic transcriptions of the spoken responses, i-vectors generated from
the audio files will be released as a baseline comparison for the
speech-based NLI task (although the audio files themselves are not included
in this data set). The data set contains test responses from 13,200 test
takers (one essay and one spoken response transcription per test taker) and
includes 11 native languages (L1s) with 1,200 test takers per L1. The 11
native languages covered by the corpus are: Arabic, Chinese, French, German,
Hindi, Italian, Japanese, Korean, Spanish, Telugu, and Turkish. The essays
typically range in length from approximately 300 to 400 words and the
transcribed spoken responses typically contain approximately 100 words.
Responses from 11,000 test takers in this set will be used as training data
for the NLI Shared Task, 1,100 for development, and the remaining 1,100 will
be released later as test data.


The shared task will be composed of three sub-tasks:

Main Task:  The first and main task will be the 11-way classification task
using all available data sources
Text Task: 11-way classification solely using the essays
Speech Task: 11-way classification using solely the transcripts and/or


Please register for the shared task via the following link:

Next, in order to obtain the training and test data for the task, all
participants must sign and return the data usage agreement form found here: [3]


Mar 27 — Training Data Release (Phase 1: Text)
Mid April — Training Data Release (Phase 2: Speech Transcripts and iVectors)
Jun 19 — Test Data Release
Jun 26 — Results Notification
Jul 05 — Draft System Description Papers Due
Jul 14 — Camera Ready Papers Due
Sep 08 — BEA12 Workshop


Aoife Cahill (Educational Testing Service)
Keelan Evanini (Educational Testing Service)
Shervin Malmasi (Harvard Medical School)
Joel Tetreault (Grammarly)

Contact email: nli.sharedtask[] [4]

Read more:

[4] mailto:nli.sharedtask[]

Об авторе Лидия Пивоварова

СПбГУ - старший преподаватель, University of Helsinki - PhD student
Запись опубликована в рубрике Конференции, Ресурсы/Софт. Добавьте в закладки постоянную ссылку.

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *