|
IN THIS SECTION:
Introduction
| What
is Logged? | Special
Cases
Definition
of Terms | Back
To "Table Of Contents"
Introduction
Your account comes with HTTP-Analyze preinstalled and configured.
HTTP-Analyze is a log analyzer for web servers. It analyzes the logfile
of a web server and creates a comprehensive summary report from the
information found there. http-analyze has been optimized to process
large logfiles as fast as possible.
In easier-to-understand terms, HTTP-Analyze is a very powerful traffic
analyzer that quickly and efficiently delivers you statistics on the
traffic that your web pages have generated. It has a user-friendly
graphical user interface (GUI) that by a click of your mouse button
will produce your traffic reports.
Below we explain in more detail how this powerful software works with
your web site, as well as provide you with definitions to the results
you'll receive.
The web server is a program running on a networked machine, waiting
for connections from the outside world to serve certain documents
on behalf of a request by a browser.
To communicate, the server and the browser use an asynchronous communication
method called the HTTP (hypertext transaction) protocol. It
works as follows:
- the user starts the browser and types
in an URL
- the browser connects to the given host
and requests the specified document.
- The web server handles the request and
sends out a response:
- if this document exists, the web
server delivers it,
- if it does not exist or if access
is not permitted, the web server sends back an error message
instead.
The document delivered as an answer to this
request may contain inline objects. Inline objects are simply
URLs pointing to another resource, either a document, an image, an
applet, a video/audio stream, or any other addressable HTML object.
The browser then requests all inline
objects of the current page from the server using the steps 2 and
3 above, before it can display the content of that page.
This communication method is called asynchronous, because the
browser sends out many requests for inline documents at once (without
waiting for a response from the server before sending the next request)
using different communication channels:
Since the browser's requests are often
handled by different server processes or different threads of a server
process, there is absolutely no relationship between the logfile entries
caused by the responses from the server due to a request of a document
and it's inline objects.
For example, the order in which the server logs the successful transmission
of the document itself and the inline images contained therein is
not predictable and depends on the type of documents, objects, server
speed, system and network load, and many other parameters.
What is logged?
Each and every response from the server - whether it indicates success,
an error, or even a timeout (i.e. no response) - gets logged
in the server's logfile. Since the server was hit by a request, such
a response is called a Hit. In other words, the total number
of hits must equal the total number of lines in the logfile minus
the number of corrupt and empty lines. A typical logfile entry in
the Common Logfile Format looks like:
hostname-[01/Feb/1998:10:10:00 +0100]"GET/index.html HTTP/1.0"200 4839
The hostname field contains the
full qualified domain name (FQDN) of the site accessing your server
(see »Special Cases« below). The next two fields usually contain
a minus (`-') to indicate that those fields are empty. The date
is surrounded by square brackets ('[' and ']'). The next field contains
the request. It contains the request method ('GET' for example),
the name of the requested document (URL), and the protocol
specification ('HTTP/1.0').
The following field contains the servers response code ('200'
stands for an 'OK', while '404' would mean 'Document not found',
for example). The last field contains the size of the document
(some servers log the number of bytes transferred actually, while
other servers log the size of the document, which makes a difference
if the user interrupts the transfer before the document could be
transmitted completely.
There are two other logfile formats, the Combined or Extended
Logfile Format. Those formats add the user-agent (browser
type) and the referrer URL (the page, which contains a link
to the requested document if this request for such document has
been generated by following a link) to the logfile entry. Those
Combined or Extended Logfile Format append following
two fields to the Common Logfile Format (CLF) in one of two
usual ways:
CLF Mozilla/2.0 (X11; IRIX 6.3; IP22) http://foo/bar.html
CLF "http://foo/bar.html" "Mozilla/2.0 (X11; IRIX 6.3; IP22)"
Note that in the second form, the user-agent
and the referrer URL are surrounded by double quotes, which
makes them ambiguous in certain cases such as erroneous referrer URLs,
which contain double quotes. Therefore, the first form should be preferred
if possible.
The entries shown above are the only information the server records
in the logfile. There might be much more information being transferred
from the browser to the server, but although this additional information
is available through CGI-scripts running on your server, it gets not
logged in the logfile. Therefore, http-analyze can only show you a
summary of the information in the logfile - nothing more, nothing
less.
Special Cases
Caching in the browser:
As soon as a page has been saved in a browser's disk cache,
the browser might send out conditional requests for documents or inline
objects. This conditional request ask the web server to only send
a document/object if it has been modified since the last time the
page has been requested (if the page is still in the browser's cache).
This way, network traffic is reduced somewhat, since documents must
be transferred only if they have changed recently. If such a conditional
request arrives, the server will respond with a Code 304 (Not
Modified) status to indicate that the document hasn't changed
or with a Code 200 (OK) status if it has changed in the
meantime. Since the browser may be configured (and usually is so by
default) to only send out such conditional requests once per session
and otherwise unconditionally use the copy from the cache, you may
not even see a Code 304 response if this users visits your site
again in the same session. Conditional requests are then sent out
only if the user terminates the browser session and later restarts
the browser.
Caching in a proxy server:
Organizations with a large number of
users - such as companies, universities, or online providers - often
use a so-called proxy server for mainly two reasons:
- Often such organizations have a firewall
to protect their internal network against intruders. This means,
that their network is logically separated from the rest of the
Internet and that they have to use such a proxy server, which
is able to communicate with the inside and the outside of their
local network.
- To reduce network load somewhat, the
proxy server acts as a local copy machine: As soon as a page is
loaded into a browser through such a proxy server, the proxy saves
a copy of this page in it's disk cache much like a browser
does in the scenario above. This way, documents requested very
often by users in the same local network need to be transferred
to the proxy only once, which then answers future requests for
the same page from it's local cache instead of connecting to the
original web server the document originated from.
Both forms of caching make it technically
impossible to count visitors or to track their way through your web
site. All you see in the logfile of your server is only a few initial
hits from the proxy or browser and probably some Code 304
responses resulting from conditional requests sent out by the proxy
or browser, depending on the preferences settings of the proxy or
browser.
Definition of Terms
The statistics report contains among others the following information:
the number of hits, 304's, files, pageviews,
sessions, data sent (in KB)
the amount of data requested, transferred,
and saved by cache (in KB)
the number of unique URLs, sites, and
sessions per month
the number of all response codes other
than 200 (OK)
the average hits per weekday and for
last week
the maximum/average hits per day and
per hour
the number of hits, files, 304's, sites,
data sent by day
the top 5 days, 24 hours, 5 minutes
and 5 seconds of the summary period
the top 30 most commonly accessed URLs
(hits, 304's, data sent)
the 10 least frequently accessed URLs
(hits, 304's, data sent)
the top 30 client domains accessing
your server most often
the top 30 browser types
the top 30 referrer hosts
the overview/detailed list of all files
requested
the overview/detailed list of all sites
by domain and reverse domain
the overview/detailed list of all browser
types
the overview/detailed list of all referrer
URLs
The following table summarizes the meaning
of all terms in the statistics report which are not self-explaining:
| Term |
Color |
Meaning |
| Hits |
 |
A hit is any response from the
server on behalf of a request sent from a browser. This includes
any response from the server, not only text files or documents.
If, for example, a HTML page has two images embedded, the server
generates three hits if this page is requested: one hit for
the HTML page itself and two hits for the two inline images. |
| Files |
 |
If the user requests a document and
the server successfully sends back a file for this request,
this is counted as a Code 200 (OK) response. Any such
response is counted for as a file. Again, "file" here means
any kind of a file. |
| Code 304 |
 |
A Code 304 (Not Modified) response
is generated by the server if a document hasn't been updated
since the last time it was requested by the user and therefore
there was no need to actually send the files for this document.
This happens if the browser (or a caching proxy server between
the browser and your web server) still has an up-to-date copy
of the page in it's local storage (cache) and therefore can
display the page without requesting the actual content. This
technique is used to reduce network traffic, but it also causes
an inaccuracy in the statistics reports regarding the number
of visitors, because the browser or proxy usually sends only
one such a conditional request per user session if it still
holds an up-to-date copy of the file. However, the ratio between
files and 304's reflects the efficiency of overall
caching mechanisms for at least those hits which made it's way
to the server. |
| Pageviews |
 |
Pageviews are all files which either
have a text file suffix (.html, .text) or which are directory
index files. This number allows to estimate the number of "real"
documents transmitted by your server. If defined correctly,
the analyzer rates text files (documents) as pageviews. Those
pageviews do not include images, CGI scripts, Java applets or
any other HTML objects except all files ending with one of the
pre-defined pageview suffixes, such as .html or .text. |
| Other responses |
¹ |
There are much more responses than
only Code 200 (OK) and Code 304 (Not Modified)
responses, especially in the coming standard, the HTTP 1.1 protocol
specification. For example, the server could generate a Code
302 (Redirected) response if a page has moved, a Code
401 (Unauthorized Request) response if access to the document
is denied or a Code 404 (Not Found) response if the requested
page does not exist on this server. |
| KBytes transferred |
 |
This is the amount of data sent during
the whole summary period as reported by the server. Note that
some servers log the size of a document instead of the actual
number of bytes transferred. While in most cases this is the
same, if a user interrupts the transmission by pressing the
browser's stop button before the page has been received completely,
some servers (for example all Netscape web servers) do not log
the amount of data transferred but the amount of data which
would have been transferred if the user would have completely
loaded the page. |
| KBytes requested |
¹ |
This is the amount of data requested
during the whole summary period. http-analyze computes this
number by summing up the values of KBytes transferred
and KBytes saved by cache (see below). |
| KBytes saved by cache |
¹ |
The amount of data saved by various
caching mechanisms such as in proxy servers or in browsers.
This value is computed by multiplying the number of Code
304 (Not Modified) requests per file with the size of the
corresponding file. Note: Because http-analyze can determine
the size of a file only if the file has been requested at least
once in the same summary period, the values for KBytes saved
by cache and KBytes requested are just approximations
of the real values. |
| Unique URLs |
|
Unique URLs are the number of all different, valid URLs requested
in a given summary period. This shows you the number of all
different files requested at least once in the corresponding
summary period. |
| Unique sites |
|
This is the sum of all unique hosts
accessing the server during a given time-window . The time-window
is hardwired to the length of the current month. This means
that if a host accesses your server very often, it gets counted
only once during the whole month. Only the sum of the unique
hosts per month is listed in the statistics report. |
| Sessions |
 |
Similar to unique sites, this
is the number of unique hosts accessing the server during a
given time-window. This time-window is one day by default for
backward compatibility, but it can be changed with the option
-u or the Session directive in the configuration file. For example,
if the time-window is two hours, all accesses from a certain
host in less than 2 hours after the first access from this host
are lumped together into one session. All following accesses
more than 2 hours apart from the first access will be counted
as a new session. This way you may get an estimated number of
how many sessions are started on different sites to access your
server. |
1 shown only on the total summary page.
|