Stance detection can be formulated in different ways. In the context of this task, we define stance detection to mean automatically determining from text whether the author is in favor of the given target, against the given target, or whether neither inference is likely. Consider the target—tweet pair:
Target: legalization of abortion
Tweet: A foetus has rights too! Make your voice heard.
Humans can deduce from the tweet that the speaker is likely against the target. The aim of the task is to test automatic systems in determining whether they can deduce the stance of the tweeter. To successfully detect stance, automatic systems often have to identify relevant bits of information that may not be present in the focus text. For example, that if one is actively supporting foetus rights, then he or she is likely against the right to abortion. We provide a domain corpus pertaining to each of the targets, from which systems can gather information to help with the detection of stance.
Automatically detecting stance has widespread applications in information retrieval, text summarization, and textual entailment. In fact, one can argue that stance detection can often bring complementary information to sentiment analysis, because we often care about the author’s evaluative outlook towards specific targets and propositions rather than simply about whether the speaker was angry or happy.
Twitter and other microblogging sites are a popular platform where people express stance implicitly or explicitly. Thus, here for the first time, we propose a shared task on detecting stance that focuses on the Twitter domain.
There are two tasks:
— Task A (supervised framework): This task will test stance towards five targets: «Atheism», «Climate Change is a Real Concern», «Feminist Movement», «Hillary Clinton», and «Legalization of Abortion». You are provided with about 2900 labeled training data instances for the five targets.
— Task B (weakly supervised framework): This task will test stance towards one target «Donald Trump». You will not be provided with any training data for this target. You are provided with a large set of tweets associated with «Donald Trump» (the domain corpus), but it is not labeled for stance. You are encouraged to develop unsupervised systems for the targets in Task A so that you can measure progress by using the training data for Task A as development set. However, Task B evaluation will only deal with «Donald Trump» instances.
You can provide submissions for either one of the tasks, or both tasks.
Classes: The possible stance labels are:
— FAVOR: We can infer from the tweet that the tweeter supports the target (e.g., directly or indirectly by supporting someone/something, by opposing or criticizing someone/something opposed to the target, or by echoing the stance of somebody else).
— AGAINST: We can infer from the tweet that the tweeter is against the target (e.g., directly or indirectly by opposing or criticizing someone/something, by supporting someone/something opposed to the target, or by echoing the stance of somebody else).
— NONE: none of the above.
Submission Format: The test data file will have the same format as the training file, except for the class label which will be shown as «UNKNOWN» for all instances. Replace «UNKNOWN» with the predicted class to create the submission file. You may choose to leave the label for an instance as «UNKNOWN», for example if your classifier is unsure of the stance. This might impact recall, but it may still be better than predicting the wrong class (see evaluation metric).
Evaluation: We will use the macro-average of F-score(FAVOR) and F-score(AGAINST) as the bottom-line evaluation metric. An evaluation script will be provided shortly so that you can:
— check the format of your submission file
— determine performance when gold labels are available (note that you can also use the script to determine performance on a held out portion of the training data to check your system’s progress)
RESOURCES THAT CAN BE USED
For Task A: You are free to use any available resources. You are also free to create new resources. For example, you are free to poll the twitter API to collect more tweets pertaining to the targets. However, you will have to clearly outline all the resources you have used at submission. If you use any additional data that is manually labeled for stance towards the targets that are part of this task, or towards entities associated with these targets, then you will be ranked separately from submissions that do not use any stance-labeled data beyond what is provided in the trial and training sets.
For Task B: You are free to use any resources (available or new) as long as you do not use tweets or sentences that are manually labeled for stance. Some very minimal labeling is permitted. For example, manually labeling a handful of hashtags is okay. You will have to clearly outline all the resources you have used at submission.
If you have any questions about the resources that can be used, do not hesitate to ask on the mailing group.
— Training data ready: September 4, 2015
— Test data ready: Dec 15, 2015
— Evaluation start: January 10, 2016
— Evaluation end: January 31, 2016
— Paper submission due: February 28, 2016
— Paper reviews due: March 31, 2016
— Camera ready due: April 30, 2016
— SemEval workshop: Summer 2016
Over the last decade, there has been active research in modeling stance. However, most works focus on congressional debates (Thomas et al., 2006) or debates in online forums (Somasundaran and Wiebe, 2009; Murakami and Raymond, 2010; Anand et al., 2011; Walker et al., 2012; Hasan and Ng, 2013; Sridhar, Getoor, and Walker, 2014), the domains in which the gold labels can easily be obtained. Faulkner (2014) investigates the problem of detecting document-level argument stance in student essays. Twitter presents a new challenge to the research community since tweets are short, informal, full of misspellings, shortenings, and slang. Rajadesingan and Liu (2014) aim to identify the stance of Twitter users from their tweets debating a controversial topic. The task we propose aims to detect stance from individual tweets, without relying on conversational structure which is often present in online debates. Nonetheless, this task has clear overlap with related tasks such as argument mining, sentiment analysis, and textual entailment.
RELATION WITH SENTIMENT ANALYSIS
Stance detection is related to sentiment analysis, but the two have significant differences. In sentiment analysis, systems determine whether a piece of text is positive, negative, or neutral. However, in stance detection, systems are to determine the author’s favorability towards a given target. The target may or may not be explicitly mentioned in the text. And the text may express opinion or sentiment about some other entity. For example, consider the target and text pair shown below:
Target: Hillary Clinton
Tweet: Jebb Bush is the only sane candidate for 2016.
The tweet expresses positive opinion towards Jebb Bush, but one can also infer from it that the tweeter is probably against Hillary Clinton. Note that even though it is possible to favor both Jebb and Hillary, in this task, we ask what is more probable.
We encourage participation of sentiment analysis systems that test the extent to which simple sentiment analysis will work for this task, as well as modfied sentiment analysis systems focused on determining stance.
RELATION WITH TEXTUAL INFERENCE/ENTAILMENT
This task can be thought of as a textual inference or entailment task, where the goal is to determine whether the favoribility of the target is entailed by the tweet. We encourage participation of such textual inference systems.
— Saif M. Mohammad, National Research Council Canada
— Svetlana Kiritchenko, National Research Council Canada
— Parinaz Sobhani, University of Ottawa
— Xiaodan Zhu, National Research Council Canada
— Colin Cherry, National Research Council Canada