A Brief History of Cross-Language Information Retrieval (CLIR)

Cross-language information retrieval (CLIR) really “kicked off” in the 1990s, especially after the widespread use of the Internet around the globe. Before the last decade of the 20th century, the overwhelming majority of researchers focused on monolingual information retrieval which, despite major changes in the past two decades, continues to be the central topic of many studies dealing with the field of information retrieval. History of CLIR, however, dates back much further than the 1990s. Many of the challenges specific to CLIR have been recognised in the 1960s. But until the 1990s, studies dealing with CLIR (as well as multilingual information retrieval or MLIR) were almost exclusively focused on the field of library science.

The Influence of the World Wide Web and TREC Experiments

History of CLIR research may date back to the 1960s, however, it probably would be getting so much attention in the today’s IR research community if there weren’t for the world wide web. Indeed, the more popular the Internet was getting, especially after the mid-1990s, the more IR studies were dealing with the question of CLIR and in particular, how to improve the existing tools and methods in order to facilitate access to relevant information in multiple languages.

In 1997, the first major experiments were launched in the Text Retrieval Conference or TREC, a series of workshops on various IR questions which have been held since 1992. More extensive CLIR experiments were launched in the TREC in 1998 and included multiple European language. Besides English and Spanish which were experimented with earlier, the 1998 TREC experiments also included French, Italian, German, Dutch and others. Each year, more languages were added to the experiments including Russian, Finnish, Hungarian, Swedish, etc.

CLIR Experiments Spread Outside Europe

At the end of the 1990s, CLIR experiments spread outside Europe. In 1999, the National Institute for Informatics (NII) of Japan held the first NII Testbeds and Community for Information access Research (NTCIR) which concentrated on Asian languages, in the first place on Japanese, Korean and Chinese. Speaking about Chinese, the Peking University has been carrying out its own IR experiments since 2004, while the Forum for Information Retrieval Evaluation has been researching CLIR methods and strategies for Indian language since 2008.