Web Analytics

There are two main technological approaches to collecting web analytics data. The first method, logfile analysis, reads the logfiles in which the web server records all its transactions. The second method, page tagging, uses JavaScript on each page to notify a third-party server when a page is rendered by a web browser.

Web Analytics Definitions

  • Hit - A request for a file from the web server. Available only in log analysis. The number of hits received by a website is frequently cited to assert its popularity, but this number is extremely misleading and dramatically over-estimates popularity. A single web-page typically consists of multiple (often dozens) of discreet files, each of which is counted as a hit as the page is downloaded, so the number of hits is really an arbitrary number more reflective of the complexity of individual pages on the website than the website's actual popularity. The total number of visitors or page views provides a more realisitic and accurate assesment of popularity.
  • Page View - A request for a file whose type is defined as a page in log analysis. An occurrence of the script being run in page tagging. In log analysis, a single page view may generate multiple hits as all the resources required to view the page (images, .js and .css files) are also requested from the web server.
  • Visit / Session - A series of requests from the same uniquely identified client with a set timeout. A visit is expected to contain multiple hits (in log analysis) and page views.
  • Visitor / Unique Visitor - The uniquely identified client generating requests on the web server (log analysis) or viewing pages (page tagging). A visitor can make multiple visits.
  • Repeat Visitor - A visitor that has made at least one previous visit. The period between the last and current visit is called visitor recency and is measured in days.
  • New Visitor - A visitor that has not made any previous visits.
  • Impression - An impression is each time an advertisement loads on a users screen. Anytime you see a banner that is an impression.

Web Server Logfile Analysis Method

Web servers have always recorded all their transactions in a logfile. It was soon realised that these logfiles could be read by a program to provide data on the popularity of the website.

In the early 1990s, web site statistics consisted primarily of counting the number of client requests made to the web server. This was a reasonable method initially, since each web site often consisted of a single HTML file. However, with the introduction of images in HTML, and web sites that spanned multiple HTML files, this count became less useful.

Two units of measure were introduced in the mid 1990s to gauge more accurately the amount of human activity on web servers. These were page views and visits (or sessions). A page view was defined as a request made to the web server for a page, as opposed to a graphic, while a visit was defined as a sequence of requests from a uniquely identified client that expired after a certain amount of inactivity, usually 30 minutes. The page views and visits are still commonly displayed metrics, but are now considered rather unsophisticated measurements.

The emergence of search engine spiders and robots in the late 1990s, along with web proxies and dynamically assigned IP addresses for large companies and ISPs, made it more difficult to identify unique human visitors to a website. Log analyzers responded by tracking visits by cookies, and by ignoring requests from known spiders.

The extensive use of web caches also presented a problem for logfile analysis. If a person revisits a page, the second request will often be retrieved from the browser's cache, and so no request will be received by the web server. This means that the person's path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor to the website.

Page tagging Analysis Method

Concerns about the accuracy of logfile analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging.

In the mid 1990s, Web counters were commonly seen - these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed by a web analytics company, and extensive statistics generated. This can be done remotely, by the web analytics company.

The web analytics service also manages the process of assigning a cookie to the user, which can uniquely identify them during their visit and in subsequent visits.

References: Wikipedia
This text is available under the terms of the GNU Free Documentation License.

x