Information Extraction from Microblogs Posted during Disasters

Call for Participation

FIRE 2016 Microblog Track
To be organized at FIRE 2016
8 — 10 December, Indian Statistical Institute, Kolkata


Information Extraction from Microblogs Posted during Disasters

Track description

User­-generated content in microblogging sites like Twitter is known to be important sources of real­time information on various events, including disaster events like floods, earthquakes, and terrorist attacks. In this track, our aim is to develop IR methodologies for extracting important information from microblogs posted during disasters.

A large set of microblogs (tweets) posted during a recent disaster event will be made available, along with a set of topics (in TREC format). Each ‘topic’ will identify a broad information need during a disaster, such as – what resources are needed by the population in the disaster­ affected area, what resources are available, what resources are required / available in which geographical region, and so on. Specifically, each topic will contain a title, a brief description, and a more detailed narrative on what type of tweets will be considered relevant to the topic. The participants are required to develop methodologies for extracting tweets that are relevant to each topic with high precision (i.e., ideally, only the relevant tweets should be identified) as well as high recall (i.e., ideally, all relevant tweets should be identified).
This is essentially an ad­hoc search task, where the main challenges are:
(i) dealing with the noisy nature of microblogs which are very small (at most 140 characters) and often written informally, using abbreviations, colloquial terms, etc, and
(ii) identifying specific keywords relevant to each broad topic. Note that, each individual microblog contains only a few words, and might not contain most of the specific keywords even though the tweet is relevant to a topic.


The data will contain:

  1. Around 50,000 microblogs (tweets) from Twitter, that were posted during the Nepal earthquake in April 2015. Tweet ids along with a script to download the tweets will be provided to the participants.
  2. A set of 5 – 8 topics in TREC format, each containing a title, a brief description, and a more detailed narrative on what type of tweets will be considered relevant to the topic.

Evaluation plan

Since the aim of this track is to extract a set of tweets that are relevant to each topic, set­-based evaluation metrics like precision, recall, and F­-score will be used. The gold­ standard, against which the set of tweets identified by the participants will be matched, will be generated by a “manual run” where human volunteers (assessors) will be given the same set of tweets and topics, and asked to identify all possible relevant tweets using a search engine (Indri).

While judging the participants’ runs, we will also arrange for a second round of assessments, if necessary, to judge the relevance of tweets that are identified by the participants but not identified during the first round of human assessment.


  • July 1, 2016: Data and topics released.
  • August 15, 2016: Run submission deadline.
  • September 15, 2016: Results declared.
  • October 15, 2016: Working notes due.


Об авторе Лидия Пивоварова

СПбГУ - старший преподаватель, University of Helsinki - PhD student
Запись опубликована в рубрике Конференции, Ресурсы/Софт. Добавьте в закладки постоянную ссылку.

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *