Using Server Statistics at CSB/SJU


CSB/SJU web server statistics are available only to people with CSB/SJU computer accounts.

There are some desirable things that Web Counters and Server Logs cannot provide, such as the proportion of your visitors using a particular browser, or the number of people who have visited a certain page. Nevertheless, server statistics can be helpful to web page owners.

I will try to explain what the various statistics mean by walking you through the two server log analysis services used at CSB/SJU.


Please note: On September 15, 1998, most of the web pages on the www.csbsju.edu server were moved to www2.csbsju.edu. I have fixed my links so that they go to the correct pages. I have also updated the information on all of these pages. Statistics are not available for the new www.csbsju.edu server.


Historical Summary Statistics

Index

File name takes the form index.01.30.html. I recommend Lynx. If you use a graphical browser, you get to see graphs with colored bars, but the file is often over 400K. And it is all one big table, so you won't see anything until the entire page loads.

If you wish, you can see an example of the beginning and end of the AccessWatch index, from 12 December 1998.

Daily Access Statistics

A summary of the total number of accesses, unique hosts, average number of pages viewed per access, total hits, kilobytes transferred, and errors. In this context, "accesses" means a computer/IP address beginning a session by making a page request, "hosts" are the originating IP addresses, and "hits" are individual page requests. Accesses are also broken down into on-campus or remote hosts. The day used in the example was a Saturday towards the end of the fall semester.

Summary Statistics

Not very interesting. Total access statistics in graphical form.

Hourly Statistics

Of interest only to server admins. Graph of highest, lowest, and average number of accesses per hour.

Page Demand

Most interesting section for web page authors. File name, followed by accesses (as a number, and as percent of total hits). Somewhat confusingly, "access" in this context refers to what was called "hits" above, i.e., the number of requests for a particular page. File names are linked to their web pages. Goes from most-requested to pages that got one hit that day. Please notice that this statistics script has not been programmed to know that www2.csbsju.edu/foobar/ is the same thing as www2.csbsju.edu/foobar/index.html, so if you have an index.html file, it may appear twice on this list.

Accesses by Domain

After the pages which were requested only once, a list of the top ten originating domains (*.edu, *.com, etc.), followed by a paragraph which tells which other top-level domains also visited that day.

Most Frequent Accesses by Host

The top ten originating IP addresses, in terms of the number of pages they requested.

Details

File name takes the form details.01.30.html. Again, file size is often very large--600K is not unusual. Use Lynx if you're not on the CSB/SJU network and don't have a fast connection. Unlike the Index, however, you should be able to see the beginning of the file as it loads.

The Details file is a list of who looked at what page when. It is sorted first by top-level domain. *.at = Austria, *.au = Australia, *.be = Belgium, etc. Within each top-level domain, it is sorted by host name, so that erols.com comes before execpc.com. If there is more than one visitor from a particular host, the individual IP addresses are sorted alphabetically. maple.computing.csbsju.edu precedes tiny.computing.csbsju.edu, for example. For each individual computer/IP address, the Details file shows the hour (in 24-hour notation), minute, and second, and the file name(s) they requested.

Details files are huge, but worth reading from time to time. This is the only statistics file that tells you which pages the search engine spiders crawled, and the only statistics file that allows you to track visitors' progress through your pages. If you have recently moved a page, do visitors seem to find the new page OK and get on with their lives, or do they act in a confused manner? Do visitors seem to be able to find what they want quickly? On the other end of the scale, if someone hits the same page over the course of a few hours, you should be proud, as they obviously found it to be a valuable resource.

If you wish, you can see a small sample of an AccessWatch details file.


Detailed Statistics

Summary information

Top Statistics

Top ten pages, domains, countries, dates, and users.

Visitor information

Visitor Information

Top 50 domains (host names), then US accesses sorted by top-level domain, and finally top 50 countries.

User Information

Pretty much useless. Supposedly which "authenticated users" are logging on most often, but, for example, all Bennies count as one user.

Page information

Page Summaries

The most interesting section for web page authors and owners. This file itself is a list of pages, from most-requested to pages that got only one request that month. Generally this file is about 300K, so if you have a reasonably fast connection, it is OK to use a graphical browser. If you like, you can see the beginning of the MkStats Page Summaries for November 1998.

The "hit parade" aspect of Page Summaries is good for bragging rights, but what is even more interesting and useful for page authors are the links. In this case, the links go not to the named web page itself but to more detailed information about that particular web page.

See an example of MkStats statistics for an individual page. The example I selected is a moderately popular page, with an average of 16 or 17 hits per day. It was found by search engines, linked from other pages, and bookmarked on at least one local computer.

For the individual page, MkStats gives a link to the page itself at the top, centered. Then there is a brief summary of how many hits the page received that month, how many bytes were transferred, and how many hits it got from local bookmark files or Usenet news articles. The hits for the last 14 days are presented as a Lynx-friendly graph. Then MkStats tells you some of the words that people used in search engines to find your page. And finally, there is a linked list of other pages that sent their visitors to you. Sometimes these are search engine forms, but most often this means that the pages listed have a link to your page.

Page Access Summary

A text-only list of all pages requested, sorted three ways. At the top is the alphabetical list. Numbers precede letters, and capital letters precede lower-case; files in a top-level directory precede subdirectories (which are sorted the same way). If you want to get a snapshot of how your pages did, search for your user name or top level directory. Gives number of accesses, percentage of total accesses on the server, number of bytes transferred, percent of server transfer, finally the file name. The bold type (underlined if you're in Lynx) indicates totals for that directory, not counting subdirectories. You can see the beginning of the MkStats Page Access Summary for November 1998.

Linked from the top of the page, you can also see pages sorted by number of accesses, from most-requested to pages requested once that month. The third sorting, also linked from the top, ranks pages by how many bytes were downloaded by visitors.

Times and dates

Time Summary

Of interest chiefly to server admins. Lynx-friendly graphs compare page accesses for: last 14 days, averaged by day of the week; daily (each day that month--same as Daily Chart); hourly average; each week that month. Small chart at bottom shows for each day of the month: number of page accesses, number of hits (includes images and CGI scripts), and number of bytes transferred.

Daily Chart

Lynx-friendly graph comparing total page accesses for each day of the month.

Graphical Daily Chart

Same as the Daily Chart, but for graphical browsers only.

Referring sites

Referer Log Summary

Visit in Lynx. This file hovers around half a meg, and the file to which it links is 3M.

At the very top is a paragraph summarizing the number of hits from external links, local bookmarks, Usenet, and search engines. Also gives the number of page accesses coming from Yahoo, Lycos, InfoSeek, Alta Vista, and Webcrawler. Tells how many times images or scripts were loaded.

If you follow the links from file names here, you'll get some info on which pages are linked to yours and which search terms people used to find you, but the Page Information is a much friendlier way to investigate that.

If you suspect that someone is stealing your graphics, search for images/cgi . Then if you see an unusually high number next to the file name of your image, follow the link, and you will find which pages are linked to your image.

Browsers used

Browser Summary

Take this with a grain of salt. Many browsers claim to be Netscape, and this analysis package does not sort them out and tell you that this user_agent string is really Opera, or that one is actually MSIE. Additionally, you should divide all numbers for graphical browsers by 4, since this browser page records "hits" rather than page accesses. In other words, for a page with 3 images, a graphical browser will chalk up 4 hits, but Lynx will only get one hit. (I arrived at the number 4 by comparing the chart at the bottom of the Time Summary page, this Browser Summary page, and the index page for the December 1997 stats.)

Errors

Error Summary

Very important for web authors and page owners to check. A log showing the number of times the server reported an error. Except for "timed out waiting", always tells which page caused the server error. Usually there is an explanation, and often the referring page is given as well.

"send aborted" is the most common error message, and means the visitor hit the stop button. If this happens infrequently in comparison to the number of times your page was requested, this is nothing to worry about. If it happens on your page a lot, you had better carefully reconsider how long it takes your page to load.

If the reason for the error is "file does not exist", there are a few possibilities. "No file matching URL" means that there is a bad link from another page--check the referring page which follows the colon. Another possibility is typos by visitors, or they may be just plain guessing what a file might be called. Sometimes "file does not exist" or "no file matching URL" is caused by the web author's typos. Or it may be that visitors have tried to access a file that has moved, in which case you might consider a forwarding message to the new location.

If you wish, you can see an example of the beginning of the MkStats Error Summary file.

Miscellaneous information

Status Codes

Lists and explains HTTP server responses. (403) = Forbidden; (404) = File not found; etc. Tells you how many times that month each response was given to a visitor's page request.


Further Reading:


ObComputing Directory | Elizabeth T. Knuth's Home Page

Comments to: eknuth@unix.csbsju.edu


Valid HTML 3.2!

Using Server Stats at CSB/SJU / Rev. 7 March 1999 / © Copyright 1999, Elizabeth T. Knuth / URL: http://www.users.csbsju.edu/~eknuth/obcomp/stats/index.html