Simplified Explanation of Proxy Servers


David Henderson originally posted this explanation of proxy servers to the Usenet newsgroup comp.infosystems.www.authoring.html as part of a discussion about server log analysis. I have edited out most of the headers and a quotation of another participant, but have left David's easy-to-understand explanation intact. He clearly shows why you cannot rely on server logs to determine the popularity of your web pages or which browsers your visitors are using. I am using this material with permission of the author.


From:         David Henderson <davidh@imsa.edu>
Date:         1998/01/04
Message-ID:   <68ou8s$25d$1@java.imsa.edu>
Newsgroups:   comp.infosystems.www.authoring.html

Here is a simple diagram of how proxies, servers, and browsers access web pages. There is a lot of detail I'm leaving out for the sake of simplicity (and my poor fingers!), but this should get the gist of it.

For this example, I will assume that the proxy can only remember the last page it loaded. Normal proxies will remember a large number of documents, images, etc.

1. User 1 on Netscape wants page A, and goes through a proxy.

   Browser ---> Proxy             "Give me A.  I'm using Netscape 3.14."
                Proxy             (I don't have page A, I'll go get it.)
                Proxy ---> Server "Give me A.  I'm using Netscape 3.14."
                           Server (Chalk one up for Netscape 3.14.)
                Proxy <--- Server "Here's A."
   Browser <--- Proxy             "Here's A."

2. User 2 on Explorer wants page A, and goes through a proxy.

   Browser ---> Proxy             "Give me A.  I'm using Explorer 4.321."
                Proxy             (I've already got page A.)
   Browser <--- Proxy             "Here's A."

The server has no record of this access since the Proxy doesn't even contact the server.

3. User 3 on Opera wants page B, and goes through a proxy.

   Browser ---> Proxy             "Give me B.  I'm using Opera 1812."
                Proxy             (I don't have page B, I'll go get it.)
                Proxy ---> Server "Give me B.  I'm using Opera 1812."
                           Server (Chalk one up for Opera 1812.)
                Proxy <--- Server "Here's B."
   Browser <--- Proxy             "Here's B."

4. User 1 on Netscape wants page B, and goes through a proxy.

   Browser ---> Proxy             "Give me B.  I'm using Netscape 3.14."
                Proxy             (I've already got page B.)
   Browser <--- Proxy             "Here's B."

The server has no record of this access since the Proxy doesn't even contact the server.

5. User 2 on Explorer wants page B, and goes through a proxy.

   Browser ---> Proxy             "Give me B.  I'm using Explorer 4.321."
                Proxy             (I've already got page B.)
   Browser <--- Proxy             "Here's B."

The server has no record of this access since the Proxy doesn't even contact the server.

6. User 3 on Opera wants page A, and goes through a proxy.

   Browser ---> Proxy             "Give me A.  I'm using Opera 1812."
                Proxy             (I don't have page A, I'll go get it.)
                Proxy ---> Server "Give me A.  I'm using Opera 1812."
                           Server (Chalk one up for Opera 1812.)
                Proxy <--- Server "Here's A."
   Browser <--- Proxy             "Here's A."

So now three people, using three different browsers, have each read two documents, A and B. But the logs on the server will only show three accesses:

  Netscape 3.14  page A
  Opera 1812     page B
  Opera 1812     page A

A person running statistics on his logs won't show a single access from Explorer 4.321, and will conclude that A is more popular than B.

As I said above, this a simple description of how it works, but I believe it shows how server logs can misrepresent the number of people reading a page.

Comments, additions, or corrections are welcome.

David Henderson
--
Looking for Star Trek news?  Visit Psi Phi's DS9 and Voyager Archives!
Available on-line at the URL: http://www.bradley.edu/campusorg/psiphi/
--<*>----<*>----<*>----<*>----<*>----<*>----<*>----<*>----<*>----<*>--
Address: davidh@imsa.edu, davidh@bradley.edu, or davidh@cyberdesic.com

Web Counters and Server Logs | Using Server Statistics at CSB/SJU


ObComputing Directory | Elizabeth T. Knuth's Home Page

Comments to: eknuth@unix.csbsju.edu


Valid HTML 3.2!

Simple Explanation of Proxy Servers / Rev. 11 December 1998 / © Copyright 1998, David Henderson and Elizabeth T. Knuth / URL: http://www.users.csbsju.edu/~eknuth/obcomp/proxy.html