Keyword Relevance

In academic information retrieval, the word relevance has been used in system evaluation for over forty years, going back to the Cranfield Experiments of the early 1960s. In the relatively new commercial search realm, among web search engine companies, search engine optimizers, and in the press, the incorrect relevancy is mistakenly being used more and more instead of the correct relevance. One can often tell from which community an information retrieval practitioner hails, depending on whether he or she uses the correct form of the word. Wikipedia's search facility once exhibited an example of use of the incorrect relevancy.

Algorithms for relevance

In the simplest case, relevance can be calculated by examining how many times a query term appears in a document (term frequency), possibly combined with how discriminative that query term is across the searched collection (often called Term Frequency-Inverse Document Frequency).

Since search engines and other businesses rely upon the accuracy of their results, many additional, more complex algorithms have been developed to estimate result relevance. Many of these algorithms, particularly those used by search engines, are hidden to the public, as a user that knows the details of a search algorithm can artificially boost his own content's ranking.

Relevance calculation is often misinterpreted by the press. For example, it has often been said that when Google burst onto the scene it was miles ahead of its competitors because it, unlike anyone else, ranked web pages by relevance. This is not true since everyone ranks by relevance. It is just that Google had come up with a fairly new way of estimating relevance, namely PageRank. But even search engines that only use TFIDF rank by relevance.

Information Retrieval

Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the Internet or World Wide Web or intranets, for text, sound, images or data. There is a common confusion, however, between data retrieval, document retrieval, information retrieval, and text retrieval, and each of these has its own bodies of literature, theory, praxis and technologies. IR is like most nascent fields interdisciplinary, based on computer science, mathematics, library science, information science, cognitive psychology, linguistics, statistics, physics.

Automated IR systems are used to reduce information overload. Many universities and public libraries use IR systems to provide access to books, journals, and other documents. IR systems are often related to object and query. Queries are formal statements of information needs that are put to an IR system by the user. An object is an entity which keeps or stores information in a database. User queries are matched to objects stored in the database. A document is, therefore, a data object. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates.

Web search engines such as Google, Live.com, or Yahoo search are the most visible IR applications.

Reference: Wikipedia
This text is available under the terms of the GNU Free Documentation License.

x