SEPLN 2013 workshop on:
Sentiment Analysis at SEPLN
Sep 20, 2013
However, the rise of social media such as blogs and social networks and the
increasing amount of user-generated contents in the form of reviews,
recommendations, ratings and any other form of opinion, has led to creation
of an emerging trend towards online reputation analysis. This analysis has
two technological aspects: sentiment analysis and text classification (or
First, the so-called sentiment analysis, i.e., the application of natural
language processing and text analytics to identify and extract subjective
information from texts, which is the first step towards the online
reputation analysis, is becoming a promising topic in the field of
marketing and customer relationship management, as the social media and its
associated word-of-mouth effect is turning out to be the most important
source of information for companies and their customers’ sentiments towards
their brands and products.
Then, automatic text classification is used to guess the topic of the text,
among those of a predefined set of categories or classes, so as to be able
to assign the reputation level of the company into different facets, axis
or points of view of analysis.
Sentiment analysis is a major technological challenge. The task is so hard
that even humans often disagree on the sentiment of a given text. The fact
that issues that one individual finds acceptable or relevant may not be the
same to others, along with multilingual aspects, cultural factors and
different contexts make it very hard to classify a text written in a
natural language into a positive or negative sentiment. And the shorter the
text is, for example, when analyzing Twitter messages or short comments in
Facebook, the harder the task becomes.
On the other hand, text classification techniques, although studied for a
longer time, still need more research effort to be able to build complex
models with many categories with less workload and increase the precision
and recall of the results. In addition, these models should work well with
short texts and deal with specific text features that are present in social
media messages (such as spelling mistakes, abbreviations, SMS language,
Within this context, the aim of TASS is to provide a forum for discussion
and communication where the latest research work and developments in the
field of sentiment analysis in social media, specifically focused on
Spanish language, can be shown and discussed by scientific and business
communities. The main objective is to promote the application of existing
state-of-the-art algorithms and techniques and the design of new ones for
the implementation of complex systems able to perform a sentiment analysis
and text classification on short text opinions extracted from social media
messages (specifically Twitter) published by a series of representative
The challenge task is intended to provide a benchmark forum for comparing
the latest approaches in these fields. In addition, with the creation and
release of the fully tagged corpus, we aim to provide a benchmark dataset
that enables researchers to compare their algorithms and systems.
Four tasks are proposed for the participants covering different aspects of
sentiment analysis and automatic text classification.
*** Task 1: Sentiment Analysis at global level ***
This task consists on performing an automatic sentiment analysis to
determine the global polarity of each message in the test set of the
*** Task 2: Topic classification ***
The technological challenge of this task is to build a classifier to
automatically identify the topic of each message in the test set of the
*** Task 3: Sentiment Analysis at entity level ***
This task consists on performing an automatic sentiment analysis, similar
to Task 1, but determining the polarity at entity level of each message in
the Politics corpus.
In this case, the polarity at entity level included in the training set of
the General corpus may be used by participants to train and validate the
Task 4: Political tendency identification
This task moves one step forward and the objective is to estimate the
political tendency of each user in the test set of the General corpus, in
four possible values: LEFT, RIGHT, CENTRE and UNDEFINED.
Julio Villena-Román Daedalus, Spain
Janine García-Morera Daedalus, Spain
José Carlos González-Cristóbal Technical University of Madrid, Spain
L. Alfonso Ureña-López University of Jaén, Spain (SINAI-UJAEN)
Miguel Ángel García-Cumbreras University of Jaén, Spain (SINAI-UJAEN)
María-Teresa Martín-Valdivia University of Jaén, Spain (SINAI-UJAEN)
Eugenio Martínez-Cámara University of Jaén, Spain (SINAI-UJAEN)
Program Committee (to be confirmed)
Alexandra Balahur EC-Joint Research Centre, Italy
José Carlos Cortizo European University of Madrid, Spain
Ana García-Serrano UNED, Spain
José María Gómez-Hidalgo Optenet, Spain
Julio Gonzalo-Arroyo UNED, Spain
Carlos A. Iglesias-Fernández Technical University of Madrid, Spain
Zornitsa Kozareva Information Sciences Institute, USA
Sara Lana-Serrano Technical University of Madrid, Spain
Bing Liu University of Illinois at Chicago, USA
Paloma Martínez-Fernandez Carlos III University of Madrid, Spain
Ruslan Mitkov University of Wolverhampton, U.K.
Andrés Montoyo University of Alicante, Spain
Rafael Muñoz University of Alicante, Spain
Günter Neumann DFKI, Germany
Paolo Rosso Technical University of Valencia, Spain
Maite Taboada Simon Fraser University, Canada
Mike Thelwall University of Wolverhampton, U.K.
José Antonio Troyano University of Seville, Spain
June 1st, 2013: Release of training and validation corpora.
June 15th, 2013: Release of test corpus.
July 1st, 2013: Deadline for registration for the tasks.
July 15th, 2013: Experiment submissions by participants.
August 2sd, 2013: Evaluation results.
August 28th, 2013: Submission of papers.
September 20th, 2013: Workshop.