AEPC 2010 - AEPC 2010 : Workshop on Annotation and Exploitation of Parallel Corpora

Sign for Notice Everyday

Home Post Event 2025Event 2026Event By Topic By Country Contact

Our Sponsors

AEPC 2010 - AEPC 2010 : Workshop on Annotation and Exploitation of Parallel Corpora

Website math.ut.ee/tlt9/aepc/ | Edit Freely

Category AEPC 2010

Deadline: September 26, 2010 | Date: December 02, 2010

Venue/Country: Tartu, Estonia

Updated: 2010-08-31 13:10:18 (GMT+9)

Call For Papers - CFP

Workshop on Annotation and Exploitation of Parallel Corpora (AEPC)

Find information on the workshop in short on the AEPC flyer.

In recent years parallel corpora have become ever more useful for data-driven Machine Translation, Word Sense Disambiguation, or Cross-language Information Retrieval. Most of the time parallel corpora were used as raw texts (i.e. without any linguistic annotation) or with independent linguistic annotation (i.e. linguistic annotation that was applied to either language side without resort to the other). We believe that the full potential of parallel corpora will be reached when parallel corpora are aligned and annotated concurrently. Many research strands like the automatic creation of parallel treebanks and parallel parsing point in this direction. In particular the popularity of syntax-enhanced approaches to statistical machine translation and the rise of multilingual corpus linguistics indicate the relevance of this workshop at this point in time.

Various projects have been initiated to build aligned parallel treebanks [Cmejrek et al., 2005, Gustafson-Capková et al., 2007, Ahrenberg, 2007] and most of them are based on tedious manual labor [Lundborg et al., 2007, Samuelsson and Volk, 2007]. Recently, several attempts have been made to automate this process mainly focused on creating syntaxoriented translation models [Wang et al., 2002, Gildea, 2003, Zhechev and Way, 2008, Lavie et al., 2008]. The main strategies are based on alignment through parsing and chunking [Spreyer et al., 2008], language pair-dependent alignment rules [Groves et al., 2004] and the use of previous word alignment to induce phrase correspondences [Zhechev and Way, 2008]. Discriminative approaches using supervised learning have been successfully applied as well [Tiedemann and Kotzé, 2009]. Using these techniques to scale up the size of available aligned treebanks opens up a wide range of new possibilities for the exploration of cross-lingual data with syntactic and semantic information.

The work on automatic tree alignment is closely related to synchronous parsing based on transduction grammars (as in [Melamed, 2003]) or based on bootstrapping from a small set of manually labeled seeds (as in [Kuhn and Jellinghaus, 2006]). The advertised advantage is that the parallel text helps in syntactic disambiguation as well as in fast and robust annotation. Multiparallel corpora are considered to be of higher value than bilingual corpora.

Automatic syntactic annotation depends on the availability of language technology modules (e.g. PoS taggers and parsers) in the respective language. Resource-poor languages might not have this technology infrastructure. Moreover manual annotation is time-consuming. Therefore [Hwa et al., 2005] and [Smith and Eisner, 2006] have proposed ways to transfer syntactic information in parallel corpora, termed annotation projection, from one language to another.

As a follow up to the work on projecting syntactic information across parallel corpora, the projection of semantic annotation was pioneered in recent work by [Padó and Lapata, 2009]. They have worked on the transfer of frame-semantic annotation across parallel corpora. We believe that improved functional and semantic projection is a necessary step to speed up the tedious process of semantic annotation. This is confirmed in recent work by [Dorr et al., 2010].

There are few tools for corpus linguistics over parallel corpora, there are even fewer for visualizing and searching annotated parallel corpora (an example is [Germann, 2007]). With the increasing interest in and availability of annotated parallel corpora we see a growing demand for such tools.

With this workshop we try to bring together researchers that work on annotating parallel corpora for various languages and purposes and researchers that explore such resources for various applications. The following research areas will be addressed:

Parallel Treebanks (manual or automatic creation)

Cross-language Word Alignment and Phrase-Structure Alignment

Parallel Grammars, Parallel Parsing

Grammar Induction

Parallel Semantic Annotation

Parallel Referent Resolution and Anaphora

Annotation Projection

Multi-parallel Corpora

Tools for Multilingual Corpus Linguistics

Exploitation of Parallel Corpora for Evaluation

Annotated Parallel Corpora for Machine Translation

Novel Applications of Annotated Parallel Corpora

AEPC Workshop Schedule

Deadline for paper submission: 26 September 2010

Notification of acceptance: 24 October 2010

Final version of paper for workshop proceedings: 15 November 2010

Workshop: 2 December 2010

Keywords: Accepted papers list. Acceptance Rate. EI Compendex. Engineering Index. ISTP index. ISI index. Impact Factor.
Disclaimer: ourGlocal is an open academical resource system, which anyone can edit or update. Usually, journal information updated by us, journal managers or others. So the information is old or wrong now. Specially, impact factor is changing every year. Even it was correct when updated, it may have been changed now. So please go to Thomson Reuters to confirm latest value about Journal impact factor.