Multilingual Information Retrieval

Since multilingual information retrieval (MLIR) is often confused with cross-language information retrieval (CLIR), we think it is crucial to first present the two concepts and explain both their similarities and differences before moving onto to questions specific for storing, searching and retrieving information in multiple languages.


Both MLIR and CLIR are sub-fields of information retrieval, that is searching, accessing and storing information relevant for the user. The importance of information retrieval has grown tremendously after the advent of the Internet and today, the majority of the world population (with access to the Internet, of course) uses some form of information retrieval, most often web search engines such as Google, Bing!, Yahoo and others. The majority of web users use the so-called monolingual information retrieval which means they are asking questions and accessing information in one language. But the rapid increase of documents in languages other than English has resulted in a growing number of web users using two or multiple languages to both ask questions and access relevant information.

If the user is utilising multiple languages to ask questions and accessing relevant information in multiple languages, they are utilising multiple language information retrieval (MLIR). In contrast, those who are asking questions in one language and accessing information in multiple languages are using the so-called cross-language information retrieval (CLIR). To simplify, MLIR is multilingual when it comes to both the query and information retrieval, while CLIR is multilingual only when it comes to facilitating access to relevant information.

Despite the Rapid Advance, Many Challenges Remain Unsolved

Techniques and methods used to facilitate the query and access to information in multiple languages have progressed tremendously in the past decade, mainly thanks to the Internet and the subsequent increase in the use of MLIR. Despite that, many challenges remained unsolved.

Since MLIR methods are largely based on those used by monolingual information retrieval and machine translation, the main challenges are closely related to the problems of monolingual information retrieval and especially to the problem of translation. This is because particular terms and concepts in one language are sometimes very difficult to translate to another language, let alone multiple languages. Machine translation has made the query and information retrieval in multiple languages a lot easier but unfortunately, it often doesn’t provide a satisfactory translation. As a result, machine translation may not provide access to relevant documents or/and information. In addition to working very hard on improving the existing machine translation programmes, researchers are therefore also intensively working on developing more reliable and accurate MLIR methods.