CLEF 2006 | Ad-Hoc 2006
The ad-hoc track will test system performance on a multilingual collection of newspaper and news agency documents. The data download page, accessible from the Workspace for Registered Participants indicates precisely which collections you need for each task
The ad-hoc track tests mono- and cross-language textual document retrieval. Similarly to last year, the 2006 track offers basic mono- and bilingual tasks plus an experimental multilingual task aimed at (but not restricted to) experienced participants.This is the “robust” task.
The goal is to retrieve relevant documents in Bulgarian, French, Hungarian and/or Portuguese collections using topics in the same language, and to submit results in a ranked list.
The 2006 bilingual task focuses on target collections for "consolidated" languages for which many experiments have already been made within CLEF (French and Portuguese) and "new" CLEF languages (Bulgarian and Hungarian - added in 2005). In CLEF we note that system performance tends to be best with target languages for which a strong test collection has been built over the years. The aim for the "consolidated" languages is thus to see if system performance can be further improved compared with previous years (and using the monolingual results as base-line), whereas with the "new" languages the aim is to strengthen the text collections and to see if the system performance achieved can be equivalent to that obtained with the "consolidated" languages. The 2005 ad-hoc bilingual track will accept runs for the following source -> target language pairs:
Any topic language -> Bulgarian target collection
Any topic language -> French target collection
Any topic language -> Hungarian target collection
Any topic language -> Portuguese target collection
The
aim is to retrieve relevant documents from the chosen target collection and
submit the results in a ranked list.
However, we
request groups that have participated in a cross-language ad-hoc task in
previous years, to submit at least one run for each target language.
In addition, this year we also offer a bilingual task aimed at encouraging system testing with non-European languages against the English or French target collections. Topics will be supplied in a variety of languages including Amharic, Chinese Oromo, Hindi, Telugu and Indonesian. Other languages can be added on demand. The aim is to stimulate the development of resources to handle these languages in a cross-language context.
Finally, newcomers only (i.e. groups that have not previously participated in a CLEF cross-language task) can choose to search the English document collection using any topic language.
The new robust task emphasizes stable performance over all topics instead of high average performance in mono-, bilingual and multilingual IR. The robust task is essentially an ad-hoc task which makes use of test collections previously developed at CLEF. The evaluation methodology will consider the geometric average as well as the mean average precision of all topics. Geometric average has proven to be a stable measure for robustness. In the long term, we are interested in topic difficulty and failure analysis for hard topics. Our intentions are outlined in the points below:
Objectives
Maybe
even categorization in 2007
- English: LA Times 94
A
common set of 50 topics will be used for both mono- and bilingual tasks and
will be found in the Workspace for Registered Participants from 15 March.
Topics have been prepared in
Amharic, Bulgarian, Chinese, English, French, German,
Greek, Hungarian,
Indonesian, Italian, Portuguese,
Russsian,
and Spanish and in other languages on demand. Please contact
carol.peters at isti.cnr.it
if you are interested in other topic languages.
As stated above the Multilingual task will use CLEF 2003 topics.
Guidelines
Detailed guidelines for participation in the 2005 Ad-hoc track with information on data manipulation, query construction and results submission will be available soon. A preliminary draft of these guidelines can be found in the Workspace for Registered Participants.
The track is coordinated jointly by ISTI-CNR, U.Padua, and Dublin City U.