David Henderson originally posted this explanation of proxy servers to the Usenet newsgroup comp.infosystems.www.authoring.html as part of a discussion about server log analysis. I have edited out most of the headers and a quotation of another participant, but have left David's easy-to-understand explanation intact. He clearly shows why you cannot rely on server logs to determine the popularity of your web pages or which browsers your visitors are using. I am using this material with permission of the author.
From: David Henderson <davidh@imsa.edu> Date: 1998/01/04 Message-ID: <68ou8s$25d$1@java.imsa.edu> Newsgroups: comp.infosystems.www.authoring.html
Here is a simple diagram of how proxies, servers, and browsers access web pages. There is a lot of detail I'm leaving out for the sake of simplicity (and my poor fingers!), but this should get the gist of it.
For this example, I will assume that the proxy can only remember the last page it loaded. Normal proxies will remember a large number of documents, images, etc.
1. User 1 on Netscape wants page A, and goes through a proxy.
Browser ---> Proxy "Give me A. I'm using Netscape 3.14."
Proxy (I don't have page A, I'll go get it.)
Proxy ---> Server "Give me A. I'm using Netscape 3.14."
Server (Chalk one up for Netscape 3.14.)
Proxy <--- Server "Here's A."
Browser <--- Proxy "Here's A."
2. User 2 on Explorer wants page A, and goes through a proxy.
Browser ---> Proxy "Give me A. I'm using Explorer 4.321."
Proxy (I've already got page A.)
Browser <--- Proxy "Here's A."
The server has no record of this access since the Proxy doesn't even contact the server.
3. User 3 on Opera wants page B, and goes through a proxy.
Browser ---> Proxy "Give me B. I'm using Opera 1812."
Proxy (I don't have page B, I'll go get it.)
Proxy ---> Server "Give me B. I'm using Opera 1812."
Server (Chalk one up for Opera 1812.)
Proxy <--- Server "Here's B."
Browser <--- Proxy "Here's B."
4. User 1 on Netscape wants page B, and goes through a proxy.
Browser ---> Proxy "Give me B. I'm using Netscape 3.14."
Proxy (I've already got page B.)
Browser <--- Proxy "Here's B."
The server has no record of this access since the Proxy doesn't even contact the server.
5. User 2 on Explorer wants page B, and goes through a proxy.
Browser ---> Proxy "Give me B. I'm using Explorer 4.321."
Proxy (I've already got page B.)
Browser <--- Proxy "Here's B."
The server has no record of this access since the Proxy doesn't even contact the server.
6. User 3 on Opera wants page A, and goes through a proxy.
Browser ---> Proxy "Give me A. I'm using Opera 1812."
Proxy (I don't have page A, I'll go get it.)
Proxy ---> Server "Give me A. I'm using Opera 1812."
Server (Chalk one up for Opera 1812.)
Proxy <--- Server "Here's A."
Browser <--- Proxy "Here's A."
So now three people, using three different browsers, have each read two documents, A and B. But the logs on the server will only show three accesses:
Netscape 3.14 page A Opera 1812 page B Opera 1812 page A
A person running statistics on his logs won't show a single access from Explorer 4.321, and will conclude that A is more popular than B.
As I said above, this a simple description of how it works, but I believe it shows how server logs can misrepresent the number of people reading a page.
Comments, additions, or corrections are welcome.
David Henderson -- Looking for Star Trek news? Visit Psi Phi's DS9 and Voyager Archives! Available on-line at the URL: http://www.bradley.edu/campusorg/psiphi/ --<*>----<*>----<*>----<*>----<*>----<*>----<*>----<*>----<*>----<*>-- Address: davidh@imsa.edu, davidh@bradley.edu, or davidh@cyberdesic.com
Web Counters and Server Logs | Using Server Statistics at CSB/SJU
ObComputing Directory | Elizabeth T. Knuth's Home Page
Comments to: eknuth@unix.csbsju.edu
Simple Explanation of Proxy Servers / Rev. 11 December 1998 / © Copyright 1998, David Henderson and Elizabeth T. Knuth / URL: http://www.users.csbsju.edu/~eknuth/obcomp/proxy.html