говорящая Википедия

We are proud to announce the availability of the Spoken Wikipedia
Corpora. They consist of time-aligned spoken Wikipedia articles for
English, German, and Dutch, totaling more than 1,000h of audio by
numerous speakers about a wide variety of topics.

The Spoken Wikipedia Project is a community effort by volunteers as
part of the Wikipedia[1] to record spoken versions of Wikipedia articles.

The corpora are freely available under a CC BY-SA license.

The annotations perfectly retain the original text and each part can
be traced back to its original place in the Wikipedia HTML.  This
enables research with respect to spoken hypertext and its markup.

In addition to per-word alignments, we also provide phoneme-level
alignments for both German and English, generated by MAUS.

DE    EN    NL
#articles    1010  1314  3073
hours audio  386h  395h  224h
aligned  249h  182h   79h
ph-aligned  129h   77h    —

This is, as far as we know, the largest corpus of freely-available
aligned speech for both German and Dutch and the largest
freely-availble corpus of aligned factual speech for English.

To download the corpora and obtain more information, please visit

There you can also obtain the pipeline to automatically download and
align a Spoken Wikipedia Corpus yourself.  We also provide a template
to adapt the software for new languages.  Note, however, that the
alignment process takes a significant amount of time.

If you use this resource in your research, please cite
Arne Köhn, Florian Stegen, Timo Baumann. 2016.
«Mining the Spoken Wikipedia for Speech Data and Beyond».
in Proceedings of LREC 2016.
or a later publication.

Об авторе Лидия Пивоварова

СПбГУ - старший преподаватель, University of Helsinki - PhD student http://philarts.spbu.ru/structure/sub-faculties/itah_phil/teachers/pivovarova
Запись опубликована в рубрике Ресурсы/Софт. Добавьте в закладки постоянную ссылку.

1 комментарий: говорящая Википедия

  1. Андрей Крижановский говорит:

    Кстати, товарищи, у нас с вами есть Русская говорящая Википедия. Ждём, когда найдутся исследователи 🙂 https://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%A1%D0%BF%D0%B8%D1%81%D0%BE%D0%BA_%D0%B0%D1%83%D0%B4%D0%B8%D0%BE%D1%81%D1%82%D0%B0%D1%82%D0%B5%D0%B9

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *