Chapter 2: Getting to Know Web Analytics
Kate Marek

Abstract

This chapter of Using Web Analytics in the Library provides guidelines for selecting and implementing web analytics tools in a library context. The author examines different variables that determine which tool might be appropriate and offers suggestions determining which tools meet different needs.


One of the first things you will need to do as you consider implementing web analytics is to select the appropriate tool for your library. In this report, I will focus specifically on Google Analytics (GA) to illustrate various aspects of web analytics. But before we get into the basics of any one program, it is useful to have some foundational knowledge of how the tools work, what programs are out there and how to choose one, and what the most used standard metrics are.


Web Tracking Basics: Data Collection Mechanisms

It's ALL clickstream data… .

When we typically hear about collecting a Web user's clickstream data, this term can actually refer to different mechanisms used to track and store users’ activities while on the Web. There are a variety of tools used to capture user information, with the two main approaches being log file data capture and page tagging.

Log File Data

Web log files were the original method of capturing and storing information about visitors to individualwebsites. Typically a request for your website comes to your server, and the server creates an electronic file entry in the log for that request. Web logs capture information such as the page name, IP address and browser of the visitor, and date and time stamps.1 Web servers collect data and create logs as part of their regular activity, independent of the user's browser, which makes the data readily available.

Relying on this method of data collection for user analysis has several disadvantages. For one thing, the data may be hard to access. If your library contracts with an external web hosting service for your website, you must work with the service to access server log file information. If your server is controlled locally, log files will probably be maintained by the information technology (IT) department rather than the website design department. Analysis and use of this information would require close collaboration with IT. In addition, log files are primarily intended for the capture of technical information. While this information is useful in the overall analysis of information technology resources, web logs are not the most effective way to capture and analyze website visitor behaviors.

Ultimately, server log information can help you evaluate traffic numbers in regards to your server load and capacity, but it tells you very little about your users or the effectiveness of your site in relation to your goals.

Page Tagging

The other common method of web analytics used today is often referred to as page tagging. This method of collecting data involves inserting tags, or lines of JavaScript code provided by an analytics program, into the source code of a webpage (mylibrary.org).

The tag code collects data from the visitor's browser and sends it to the analytics program's remote host computer, where reports are available to the mylibrary.org owners.

Page tagging has become quite popular, as both implementation and management of the analytics tool are much easier than through using the log file method. Many of the analytics tools available today are a service of a third-party vendor, which frees the mylibrary.org owners from having to develop an internal technical analytics infrastructure.

Advantages of Page Tagging as An Analytics Tool

In his book Web Analytics: An Hour a Day, Avinash Kaushik points out a number of advantages to using page tagging:2

  • Tagging is extremely easy to implement. Once you sign up with an analytics program vendor, you add a few lines of code to the <head> section of each HTML page. The analytics program immediately begins to capture vast amounts of data from its visitors.
  • A tagging solution is possible for those without access to their servers. Analytics data gathered through JavaScript tags is made available through the third-party analytics program host, and thus you do not need to rely on server access.
  • As a webmaster or a design team, you have significant local control over data review and analysis. While typically a tremendous amount of data is available, you can easily select the data that is most relevant to your own specific goals and outcomes.
  • Similarly, you can create administrative reports that clearly highlight data relevant to your library's priorities.
  • Innovation in web analytics is focused on page tagging. Using this data collection mechanism keeps you closer to ongoing upgrades and revisions in the field.

There are, however, some concerns with page tagging as an analytics tool. The tagging process involves the use of cookies, and some web users turn off both cookies and JavaScript. These visitors will be invisible to you. In addition, as mentioned in chapter 1, the use of cookies raises some privacy issues for our customers who do not turn off cookies in their Web use. These issues should be clearly discussed both within the library and with our website visitors.

Cookie Basics

All cookies are small chunks of data that are sent from a website you visit to your hard drive so that yourcomputer can provide information to your browser as you use the Internet. There are various distinctions to keep in mind: session cookies versus persistent cookies, and first-party cookies versus third-party cookies.

Session Cookies and Persistent Cookies

A session cookie is stored only as long as your visit lasts within a particular website, stores your browsing history for as long as you stay at that website, and is erased when you close your browser. A persistent cookie, on the other hand, stays stored on your hard drive as long as the file is programmed to stay there, or until you manually remove it.3

First-Party and Third-Party Cookies

Most web analytics programs, including Google Analytics (GA), use only first-party cookies. First-party cookies are set by the host website itself and are used by that host website to store a user's data so that person can return to the site without starting all over as a new customer. In addition, the first-party cookies used by GA allow GA to analyze things such as which keywords and referring sites bring visitors to your site.4

Third-party cookies are set from a domain outside the one shown on the user's address bar, usually without the knowledge of the user, and are typically used by advertisers to collect and store an individual's browsing habits. These cookies are much more likely to be turned off by web users, as they are considered more invasive and thus more objectionable.

Other Web Data Capture Methods

Additional methods of data capture include web beacons and packet sniffing. Web beacons are 1 × 1 pixel GIF images placed in webpages, usually hosted by a third-party server. Packet sniffing captures all of the user's data, including passwords. Both methods are typically used with commercial websites so that advertisers can track the effectiveness of their ads through number of views, user behavior, and so forth.

To a great extent these methods have fallen out practice, due mostly to privacy concerns and the fact that a vast number of users (some estimates as high as 40 percent) turn off third-party cookies. Many fewer people turn off first-party cookies, as they are much less invasive, they collect only anonymous data, and it is very difficult to browse the Web if you turn them off.

Choosing a Program

Shopping for a web analytics program is quite similar to shopping in general: you thoughtfully consider what is important to you, and then you examine the options within your budget. Analytics considerations include the cost, the level of sophistication for data collection and analysis, the reporting options, and the program's overall accessibility. You may also consider advanced options such as segmentation of users, depth of analysis for the individual metrics, and host-level support. GA is an excellent choice for most libraries, as it offers tremendous data collection, ease of use and customization, and support. And there is no cost for its implementation.

GA, as a free service, has been so successful that the commercial services have had to continually refine and specialize their products. Examples of commercial products include Omniture, WebTrends, ClickTracks, and CoreMetrics. These programs can cost thousands of dollars per year, but usually include a range of services along with the analytics program and may be good alternatives for the largest library systems. For a thorough discussion of what to look for in a commercial analytics vendor, see Avinash Kaushik's blog post “Web Analytics Tool Selection: Three Questions to Ask Yourself” from January 30, 2007, and chapter 2 in his book Web Analytics 2.0.

Web Analytics Tool Selection: Three Questions to Ask Yourself

www.kaushik.net/avinash/2007/01/web-analytics-toolselection-three-questions-to-ask-yourself.html

At the other end of the spectrum are open source alternatives. Piwik and OWA, or Open Web Analytics, are GPL-licensed, downloadable systems that use page tagging. AWStats is a log file–based GPL-licensed product. There are quite a few comparison sites available via a simple web search for “open source web analytics.”

Piwik

http://piwik.org

Open Web Analytics

www.openwebanalytics.com

AWStats

http://awstats.sourceforge.net

Although there are various reasons for an individual organization's ultimate choice of a web analytics program, for the purposes of clarity I will use a single program for illustration and description throughout the rest of this report. While not specifically endorsing GA or rejecting other programs, I will use GA as a frame for library-specific descriptions of how to use website user data to analyze your website goals and customer satisfaction.


GA Basics

Like the page tagging analytics programs described earlier in this chapter, GA begins with the insertion of JavaScript code tags in your webpages. The GA program is available to anyone who has a basic Google account. Once you are a registered Google account holder, you can very easily sign up for the analytics program. Details of the step-by-step process are outlined in chapter 3.

But what will the program measure? How will you use the various metrics to help you understand more about your website customers, and whether your website is successful? This section will outline some of the basic analytics terms as defined by the Web Analytics Association, along with some comments regarding their relevance to library usage.

Definitions from the Web Analytics Association are noted with the (WAA) designation, and are taken as posted in Occam's Razor.5

Here are some useful terms in addition to those defined by the WAA:

For more information, see a complete list of terms in the Google Analytics Glossary.

Google Analytics Glossary

www.google.com/support/googleanalytics/bin/topic.py?topic=11285


Possible Actions

Looking at these metrics, you can begin to see some possible actions libraries could take as a result of an analysis of their website analytics data. Here are just a few examples:

Chapter 3 will provide specific information about setting up GA and will begin to show how you can identify these metrics from the GA graphs and reports.


Resources

All About Cookies.org. “About Cookies: Are All Cookies the Same?” www.allaboutcookies.org/cookies/cookies-the-same.html. Accessed March 14, 2011.

Clifton, Brian. Advanced Web Metrics with Google Analytics, 2nd ed. Indianapolis, IN: Wiley Publishing, 2010.

Google Analytics. “Glossary.” www.google.com/support/googleanalytics/bin/topic.py?topic=11285. Accessed March 31, 2011.

Kaushik, Avinash. Web Analytics: An Hour a Day. Indianapolis, IN: Wiley Publishing, 2007.

Kaushik, Avinash. “Web Analytics Standards: 26 New Metrics Definitions.” Occam's Razor. Aug. 23, 2007. www.kaushik.net/avinash/2007/08/web-analytics-standards-26-new-metrics-definitions.html. Accessed March 31, 2011.

Tonkin, Sebastian. “Top Ten Myths About Google Analytics.” Google Analytics blog. May 28, 2009. http://analytics.blogspot.com/2009/05/top-ten-myths-about-google-analytics.html. Accessed March 31, 2011.

Web Analytics Association. “Standards Committee Deliverables.” www.webanalyticsassociation.org/?page=standards. Accessed March 31, 2011.


Notes
Avinash Kaushik, Web Analytics: An Hour a Day (Indianapolis, IN: Wiley Publishing, 2007), 26.
Kaushik, Web Analytics, 32.
“About Cookies: Are All Cookies the Same?” www.allaboutcookies.org/cookies/cookies-the-same.html, All About Cookies.org website (accessed March 14, 2011).
Sebastian Tonkin, “Top Ten Myths about Google Analytics,” May 28, 2009, Google Analytics blog, http://analytics.blogspot.com/2009/05/top-ten-myths-about-google-analytics.html (accessed March 13, 2011).
Jason Burby, Angie Brown, and WAA Standards Committee, Web Analytics Definitions (Washington DC: WAA, Aug. 16, 2007), as quoted in Avinash Kaushik, “Web Analytics Standards: 26 New Metrics Definitions.” Occam’s Razor, Aug. 23, 2007, www.kaushik.net/avinash/2007/08/web-analytics-standards-26-new-metrics-definitions.html (accessed March 18, 2011). See also Web Analytics Association, “Standards Committee Deliverables,” www.webanalyticsassociation.org/?page=standards.
Avinash Kaushik, “Standard Metrics Revisited: #2: Top Exit Pages,” Occam’s Razor, Dec. 27, 2006, www.kaushik.net/avinash/2006/12/standard-metrics-revisited-top-exit-pages.html (accessed March 18, 2011).
Brian Clifton, Advanced Web Metrics with Google Analytics, 2nd ed. (Indianapolis, IN: Wiley Publishing, 2010), 6.
Kaushik, Web Analytics, 10.
Clifton, Advanced Web Metrics, 11.

Article Categories:
  • Information Science
  • Library Science