Семинар Real-world text mining using machine learning (СПб)

В ближайшую субботу, 21 апреля, на семинаре по автоматической обработке естественного языка выступит профессор Ян Жижка (Чехия).

Ян прочтет доклад на тему «Real-world text mining using machine learning». Продолжительность доклада — 3 часа. Внимание — семинар будет проходить на АНГЛИЙСКОМ языке.


Today, huge volumes of text data are available, especially on the Internet. Very often, the data is not structured and the text is freely written by the Internet users in natural languages. Such the data is expected to contain interesting or valuable information that can be used for different goals in a lot of application areas. Because the data is too big,
it is very difficult or impossible to process it «manually» within an acceptable time. Fortunately, modern informatics procedures and methods enable us to apply sophisticated
methods included in artificial intelligence, especially the set of algorithms called machine learning. Machine learning methods applied to text mining are based on the inductive
learning from existing examples.

In the first part, the talk deals with a brief introduction to some machine learning methods applied to text mining. The main problems are connected with the appropriate preprocessing of the data, designing the mining procedure including selection of suitable algorithms and interpreting the results.

In the second part, some interesting results obtained from the real-world data will be presented. The data represents opinions/sentiments of customers’ reviews relating to services provided by hotel accommodation all over the world. The reviews are written by hundreds of thousands of customers in many languages. The focus of the described research was on revealing typical words and phrases in several languages, including English, Spanish, French, German, Japanese, Russian, Czech, and others.

Семинар пройдет по адресу: 10 линия В.О., дом 49, ауд. 308. Начало в 17:00.
Пароль для прохода через вахту: «Я на семинар».

Об авторе Дмитрий Грановский

— Яндекс, разработчик — СПбГУ, ассистент — OpenCorpora.org, разработчик
