Cross-Language Information Retrieval

Cross-language information retrieval, also known under the abbreviation of CLIR is, like its name suggests, a field or discipline which deals with retrieving information from documents in language(s) other than the language used in the query. So if the user is trying to access information in English language, the CLIR techniques enable him or her to retrieve relevant information in languages other than English. In addition to problems related to information retrieval (IR) itself, this obviously also raises the problem of the question of the query or/and the source of information.

The World Wide Web and Cross-Language Information Retrieval

The world wide web has revolutionised the access to information, making just about everything available within a few seconds. But besides providing easy access to relevant information in a wealth of media, the world wide web also facilitated information retrieval in other languages. But what if the user doesn’t speak the language of the document containing relevant information? Well, this is where the CLIR steps in. But in order to help the user retrieve information in languages other than that of the query or/and document containing the information, it is necessary to determine what the user sees as relevant information.

Computer Vs Human Interpretation of Relevant

The main goal of information retrieval is to help the user access relevant information. For example, if the user is searching for information about the Olympic Games, he or she should be provided access to documents containing this information including in written and spoken word, video, image and other types of media. However, the computer doesn’t know whether the user is interested in the recent Olympic Games or is looking for general information about this sports competition. If the event is taking place or is about to take place at the time of the query, the computer will assume that the user is trying to access information about that particular event. But in some instances, the user will need to be more specific in their description or keywords of the query. This is because the computers currently rely on the user’s description or keywords to “decide” what he or she may find relevant and facilitate access to documents containing potentially relevant information.

Traditional Information Retrieval Issues Plus the Language Factor

The CLIR has virtually the same goal as its “parent” discipline or information retrieval. But besides the traditional issues that are encountered in information retrieval, most notably the problem of computer vs human interpretation of what is relevant, the CLIR also has to address the issue of the language. In order to enable the user access relevant information if the document is not in language of their query, the CLIR has to deal with the question of translation technique(s), either of the query or the document.