The objective of the Universal Anaphora Initiative is to enable further progress in the empirical study of anaphora by covering not just coreference, but all aspects of anaphoric interpretation from identity of sense anaphora to bridging to discourse deixis (although not all anaphorically annotated corpora cover all of these phenomena); and not just English, but all languages.
Thanks to recent advances in NLP and representation learning, interest in co-reference resolution and related tasks is at a high point. At the same time, large scale resources are still only available for few languages, and languages with multiple resources (notably English) do not employ a uniform standard, neither in terms of annotation formats, nor in terms of guidelines. These defecits are crucial to address as we move towards broader coverage of languages and text types, as well as work on multilingual and reusable tools for different types of anaphora.
Showing a blueprint for a way forward, the Universal Dependencies Initiative has been very successful at progressively developing agreed upon standards concerning the annotation and markup of syntactic dependency information across multiple languages and domains. The long-term objective of the UD-inspired Universal Anaphora Initiative is to do the same for anaphoric information.
The more modest objective of the first stage of the Universal Anaphora Initiative is to come up with an agreed-upon markup scheme that can be used to encode the information in the existing corpora, which will enable us to create a collection of corpora all encoded using the same scheme. But we hope that this exercise will prove a useful starting point for further discussion on the annotation schemes, as well.
A unified markup scheme hopefully will also allow us to develop an extension of the CONLL scorer able to score not just identity anaphora, but also other aspects of anaphoric interpretation such as the identification of non-referring expressions, as done in the 2018 CRAC Shared Task, as well as bridging reference and discourse deixis resolution.
We are fully aware that there is at the moment only partial agreement on the anaphoric phenomena that should be covered by such a scheme, and on the details of how they should be annotated. We will therefore adopt a similar strategy to that adopted in the MATE proposal (Poesio et al., 1999) - namely, identify those aspects of the proposal which are core and those that are optional.
More to follow soon!
Massimo Poesio, Florence Bruneseaux, and Laurent Romary. 1999.
The MATE meta-scheme for coreference in dialogues in multiple languages.
InProc. of the ACL Workshop on Standards and Tools for Discourse Tagging, pages 65–74.