Mission Statement: To foster the development of new and improved ways of
measuring the quality and understanding the properties of vector space
representations in NLP.
Time & Location: Berlin, Germany, August 12th 2016 (ACL 2016 workshop).
Models that learn real-valued vector representations of words, phrases,
sentences, and even document are ubiquitous in today’s NLP landscape. These
representations are usually obtained by training a model on large amounts
of unlabeled data, and then employed in NLP tasks and downstream
applications. While such representations should ideally be evaluated
according to their value in these applications, doing so is laborious, and
it can be hard to rigorously isolate the effects of different
representations for comparison. There is therefore a need for evaluation
via simple and generalizable proxy tasks. To date, these proxy tasks have
been mainly focused on lexical similarity and relatedness, and do not
capture the full spectrum of interesting linguistic properties that are
useful for downstream applications. This workshop challenges its
participants to propose methods and/or design benchmarks for evaluating the
next generation of vector space representations, for presentation and
detailed discussion at the event. Following the workshop, the
highest-quality proposals will receive the support of the organizers and
participants, and some financial support, to help produce their proposed
resource to the highest standard.
We encourage researchers at all levels of experience to consider
contributing to the discussion at RepEval by making a short submission.
This can either be as an *analysis* of existing benchmarks or by
*proposing* new ones.
An analysis submission should analyze and discuss the strengths and
weaknesses of existing evaluation tasks, providing helpful insights for
designers of new tasks. Analysis papers will be reviewed, accepted, and
published *before* the proposal track’s camera-ready deadline, so that new
task proposals could benefit from these findings.
As part of their analysis, papers in this track might like to consider the
What are the pros and cons of existing evaluations?
What are the limitations of task-independent representation or its
Given a specific downstream application, which existing evaluation (or
family of evaluations) is a good predictor of performance improvement?
evaluations? Which are not?
What methodological mistakes were made in the creation of existing
The analysis track is *not* limited to these topics. We believe that any
manuscript presenting a sound argument on representation evaluation would
be a great addition to the workshop.
A proposal submission should propose a novel method for evaluating
representations. It does not have to construct an actual dataset, but it
should describe a way (or several optional ways) of collecting one.
Proposals are expected to provide roughly 5-10 examples as a proof of
In addition, each proposal should explicitly mention:
Which type of representation it evaluates (e.g. word, sentence, document)
For which downstream application(s) it functions as a proxy
Among other important points, proposals should take the following into
If the task captures some linguistic phenomenon via annotators, what
evidence is there that it is robustly observed in humans (e.g.,
How easy would it be for other researchers to accurately reproduce the
evaluation (not necessarily the dataset)?
Will the dataset be cost-effective to produce?
Is a specific family of models expected to perform particularly better (or
worse) on the task? In other words, which types of models is this
evaluation targeted at?
How should the evaluation’s results be interpreted?
Submissions to both tracks should be 2-4 pages of content in ACL format,
with an unlimited amount of pages for references. For the proposal track,
we encourage shorter content (2-3 pages), leaving more room for examples
and their visualization.
===Best Proposal Awards *Sponsored by Facebook AI Research*===
Two proposal-track papers will be selected by a special committee, and
awarded financial support for turning their idea into a large-scale
high-quality dataset via crowdsourcing or other annotation efforts. We hope
that the workshop community’s endorsement will also promote the use of
these new evaluations.
Submission: May 8th 2016
Notification: June 5th 2016
Camera-Ready (Analysis Track): June 12th 2016
Camera-Ready (Proposal Track): June 26th 2016*
Workshop Date: August 12th 2016
*This will give proposal-track authors enough time to go over any relevant
results that may rise from the analysis track, and cite them as motivation.