Training data released
Semantic Textual Similarity (STS) measures the degree of equivalence
in the underlying semantics of paired snippets of text. Interpretable
STS (iSTS) adds an explanatory layer. Given the input (pairs of
sentences) participants need first to identify the chunks in each
sentence, and then, align chunks across the two sentences, indicating
the relation and similarity score of each alignment.
For instance, given the following two sentences (drawn from a corpus
12 killed in bus accident in Pakistan
10 killed in road accident in NW Pakistan
A participant system would split the sentence in chunks:
 [killed] [in bus accident] [in Pakistan]
 [killed] [in road accident] [in NW Pakistan]
And then provide the alignments between chunks, indicating the
relation and the similarity score of the alignment, as follows:
 <=>  : (SIMILAR 4)
[killed] <=> [killed] : (EQUIVALENT 5)
[in bus accident] <=> [in road accident] : (MORE-SPECIFIC 4)
[in Pakistan] <=> [in NW Pakistan] : (MORE-GENERAL 4)
Given such an alignment, an automatic system could explain why the two
sentences are very similar but not equivalent, for instance, phrasing
the differences as follows:
the first sentence mentions «12» instead of «10»,
«bus accident» is more specific than «road accident» and
«Pakistan» is more general than «NW Pakistan» in the second.
While giving such explanations comes naturally to people, constructing
algorithms and computational models that mimic human level performance
represents a difficult natural language understanding (NLU) problem,
with applications in dialogue systems, interactive systems and
Please check the task website for more details on chunking, alignment,
relation labels and scores.
== Datasets ==
Two datasets are currently covered, comprising pairs of sentences from
news headlines and image captions. The pairs are a subset of the
datasets released in the STS tasks. Please check the iSTS train
dataset for details.
== New in 2016 ==
The 2015 STS task offered a pilot subtask on interpretable STS, which
showed that the task is feasible, with high inter-annotator agreement
and system scores well above baselines.
For 2016, the pilot subtask has been updated into a standalone
task. The restriction to allow only 1:1 alignment has been
lifted. Annotation guidelines have been updated, and new training has
been released. Please check out
http://alt.qcri.org/semeval2016/task2/for more details.
== Participants ==
If you are interested in participating, you should:
join the mailing list for updates at
check the guidelines and train data at
register at the semeval website:
Note that registration and mailing list management are independent,
please do both of them.
== Important dates ==
Train data ready: NOW!
Evaluation start: January 10, 2016
Evaluation end: January 31, 2016
Paper submission due: February 28, 2016 [TBC]
Paper reviews due: March 31, 2016 [TBC]
Camera ready due: April 30, 2016 [TBC]
SemEval workshop: Summer 2016
== Organizers ==
Eneko Agirre, Aitor Gonzalez-Agirre, Iñigo Lopez-Gazpio, Montse
Maritxalar, German Rigau and Larraitz Uria.
University of the Basque Country
== Reference ==
Agirre, E. and Banea, C. and Cardie, C. and Cer, D. and Diab, M.
and Gonzalez-Agirre, A. and Guo, W. and Lopez-Gazpio, I. and
Maritxalar, M. and Mihalcea, R. and Rigau, G. and Uria, L. and
Wiebe, J. (2015). SemEval-2015 task 2: Semantic textual similarity,
English, Spanish and pilot on interpretability. In Proceedings of
the 9th International Workshop on Semantic Evaluation (SemEval
2015), June. [http://anthology.aclweb.org/S/S15/S15-2045.pdf]