cross-linguistically consistent treebank annotation

Universal Dependencies, version 1

We are happy to announce the release of the annotation guidelines for Universal Dependencies at

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). The general philosophy is to provide a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages, while allowing language-specific extensions when necessary.

We intend to treat version 1 as stable for at least the next year, but we may subsequently make further revisions based on experiences using it to treebank a range of languages. Our goal is to make a first release of data sets with language-specific documentation by January 1, 2015. If you are interested in contributing to this effort, please get in touch.

Jinho Choi, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher Manning,
Ryan McDonald, Joakim Nivre, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Dan Zeman

Об авторе Лидия Пивоварова

СПбГУ - старший преподаватель, University of Helsinki - PhD student
Запись опубликована в рубрике Ресурсы/Софт. Добавьте в закладки постоянную ссылку.

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *