CLEF 2005 | Agenda

CLEF Agenda for 2005 

CLEF 2005 offers a series of evaluation tracks to test different aspects of information retrieval system development. The aim is to promote research into the design of user-friendly, multilingual, multimodal retrieval systems. Information on the test collections available for each track can be found in the instructions on How to Participate.

There are eight evaluation tracks in 2005

Mono-, Bi- and Multilingual Document Retrieval on News Collections (Ad-Hoc)

The ad-hoc track will test system performance on a multilingual collection of news documents. There will be two main tasks this year testing bilingual (L1->L2) and monolingual non-English information retrieval systems. Monolingual and bilingual retrieval tasks are offered  for Bulgarian, French, Hungarian and Portuguese collections. In the bilingual task, newcomers only (i.e. groups that have not previously participated in a CLEF cross-language task) can choose to search the English document collection using any topic language. A common set of topics (i.e. structured statements of information needs from which queries are extracted) will be prepared in Bulgarian, English, French, German, Hungarian, Italian, Portuguese, Spanish,  and Chinese. Topics in other languages can be supplied on demand.
T
he multilingual task is based on the CLEF 2003 multilingual-8 test collection and aims at measuring progress over time in multilingual (L1->Ln) retrieval system performance. For more details on these tasks see the Ad-Hoc website.
The track is coordinated jointly by ISTI-CNR
, U.Padua and Dublin City U.

Mono- and Cross-Language Information Retrieval on Structured Scientific Data (Domain-Specific)

This track studies retrieval in a domain-specific context using the GIRT-4 German/English social science database (as a pseudo-parallel corpus with identical documents) and the Russian Social Science Corpus (RSSC). Multilingual controlled vocabularies (German-English, English-German, German-Russian, English-Russian) will be available. Monolingual and cross-language tasks will be offered. Topics will be available in English, German and Russian (other topic languages may be added). Participants can make use of the indexing terms inside the documents and/or the Social Science Thesaurus provided, not only as translation means, but also for tuning relevance decisions of their system. This track is coordinated by IZ Bonn. Please see the Domain-Specific website for more information.

Interactive Cross-Language Information Retrieval (iCLEF)

The challenge is to build a system that will allow real people to find information that is written in languages that they have not mastered, and then measure how well representative users are able to use the system that has been built. This year the iCLEF track will focus on the problems of cross-language question answering and image retrieval from a user-inclusive perspective. Participating groups will adapt a shared user study design to test a hypothesis of their choice, comparing reference and contrastive systems. The track is coordinated by LSI-UNED and the QA/Image CLEF organizers, ITC-irst, CELCT Trento, and U. Sheffield. See the iCLEF website.

Multiple Language Question Answering (QA@CLEF)

Following the positive outcome of the 2003 and 2004 QA@CLEF evaluation campaigns, in this evaluation exercise monolingual (non-English) and cross-language QA systems will be tested. Combinations between nine or more source languages (Bulgarian, Dutch, English, Finnish, French, German, Italian, Portuguese and Spanish) and eight target language collections (Bulgarian, Dutch, English, Finnish, French, German, Italian, Portuguese and Spanish) will be explored. The track will include a main task, where factoid and definition questions are given as input, and a pilot tasks, that will explore  facets of multilingual QA on the WWW. The track is coordinated by ITC-Irst and CELCT, Trento. Information for participants will be available at the QA@CLEF website.

Cross-Language Retrieval in Image Collections (ImageCLEF)

This track evaluates retrieval of images described by text captions based on queries in a different language; both text and image matching techniques are potentially exploitable. Four tasks are offered:

It is expected that content-based image retrieval will be used for the medical tasks and a base system will be made publicly  accessible.

The tasks offer different and challenging retrieval problems  for cross-language image retrieval. The first task is also envisaged as an entry level task for newcomers to CLEF and to CLIR.

Three test collections are available: St Andrews University historical photographic collection; the ImageCLEFmed collection made available by the University and University Hospitals Geneva in collaboration with Oregon Health and Science University (OHSU); the IRMA database of 10,000 medical images, copyright the IRMA group, Aachen University of Technology (RWTH), Germany (use is currently limited to the ImageCLEF competition). ImageCLEF is coordinated by Sheffield University, the University and U.Hospitals of Geneva, Oregon Health and Science U., Aachen RWTH and Victoria University. For more information see the ImageCLEF website.

Cross-Language Spoken Document Retrieval (CL-SR)

Mono- and cross-language retrieval will assessed be on the Malach collection of spontaneous conversational speech from the Shoah archives. The collection for 2005 is in English and will consist of approx. 750 hours in topically coherent segments plus ~5 keywords and 3 sentence summary. A thesaurus with ca 3000 core concepts, 30000 location-time pairs and is-a and part-whole relationships, an in-domain expansion collection, and a word lattice for part of the collection may also be available on arrangement. 25 topics in 6 languages will be prepared: Czech, English, French, German, Russian and Spanish. 40 existing topics can be used for training. There will be 5-level relevance judgements. The track will be coordinated by Dublin City U., Ireland, and U. Maryland, USA. See the CL-SR website for more information.

Multilingual Web Track (WebCLEF)

For multi/crosslingual retrieval the web is the natural and common setting.  In the European context, many issues for which people turn to the web are essentially multilingual.  These include law, economy, culture, education, leisure, travel. The WebCLEF document collection will consist of webpages from European governmental sites for at least 10 languages/countries.  The collection will contain about 2M pages, with at least be 50K documents for each major language. The documents will be in various formats: HTML, TXT, PDF, etc.  For the first year, multilingual navigation tasks such as home page finding and named page finding will be assessed, modeling users that access governmental information in the EU. The topic development and relevance assessment process will be in the hands of participants.  The track will be coordinated by the University of Amsterdam.  See the WebCLEF website.

Cross-Language Geographical Retrieval (GeoCLEF)

The track provides a framework in which to evaluate GIR systems for search tasks involving both spatial and multilingual aspects. Given a statement describing a spatial user need (topic), the challenge is to find relevant documents from target collections of British English and German (or Spanish) news documents. 26 topics will be prepared in several languages including English, Spanish, Italian and German. They will be structured in the form <concept><spatial relation><region>, e.g. "find stories about disasters in Geneva". The track will run as a pilot experiment and is coordinated by UC Berkley and U. Sheffield. See the GeoCLEF website.