Search Engine Indexing

Some search engines index more web pages or index web pages more often than others. The result is that all search engines have a unique structure for organizing the collection of web pages you are searching through.

Popular engines focus on the full-text indexing of online, natural language documents, yet there are other searchable media types such as video, audio, and graphics. Meta search engines reuse the indices of other services and do not store a local index, whereas cache-based search engines permanently store the index along with the corpus. Unlike full text indices, partial text services restrict the depth indexed to reduce index size. Larger services typically perform indexing at a predetermined interval due to the required time and processing costs, whereas agent-based search engines index in real time.

The goal of storing an index is to optimize the speed and performance of finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would take a considerable amount of time and computing power. For example, an index of 1000 documents can be queried within milliseconds, where a raw scan of 1000 documents could take hours. No search engine user would be comfortable waiting several hours to get search results. The trade off for the time saved during retrieval is that additional storage is required to store the index and that it takes a considerable amount of time to update.

References: Wikipedia
This text is available under the terms of the GNU Free Documentation License.

x