Bilingual Information Retrieval

Bilingual information retrieval refers to accessing information in language other than that of the query, for example accessing document in Spanish while using English language for query. This, obviously, requires translation of the source language in order to be able to access relevant information. But while this may seem a relatively straightforward process, bilingual cross-language information retrieval still has many challenges to solve and questions to answer.

The Problem of Direct Translation

In theory, bilingual information retrieval requires only a good bilingual dictionary or machine translation programme. Both the traditional dictionaries and machine translations are easily available for a growing number of language pairs, with many also including multiple translations for particular words in order help the user select the most suitable words/expressions to access relevant information. But there are two main problems with this strategy.

First, not even the most exhaustive dictionaries include the entire vocabulary of any given language. This means that particular words, usually the names of places, organisations and individuals are simply missing. It is sometimes possible to guess the meaning from the context but this usually requires a high level of proficiency in the source language. Second, one needs to decide which, if any of the proposed words is the right one. User without an in-depth understanding of the language pair thus has two options.

He or she can search by using all the proposed words or using just the first word that is offered by the dictionary/machine translation. In both cases, however, the query may not bring the user to the desired information. While using all proposed translations increases the likelihood of finding relevant information, it also increases the risk of errors because most of the propositions will be incorrect in a given context. The second option, that is to use the first proposed translation makes sense because most dictionaries and translation programmes first offer the most frequently used translation. But not all dictionaries are organised in this manner. Also, this strategy doesn’t address the question of the context.

Bilingual Information Retrieval Requires a Multiple-Level Approach to Overcome the Issues of Direct Translation

For reasons mentioned above, simple word-by-word translation that is offered by the traditional dictionaries and modern machine translation programmes cannot be used for bilingual information retrieval. In order for the user to be able to access information in the language other than that of the query, a multiple-level approach is required that deals not only with translation of individual words but also addresses the structure, semantics and complexity of both the source and target languages.