Home arrow Apache arrow Page 3 - Getting Started with Apache 2.0 Part II

Who Are You? - Apache

In this second article in a three-part series, you will learn how to customize the the log files generated by the Apache Web server, and much more.

  1. Getting Started with Apache 2.0 Part II
  2. The Apache Log Files
  3. Who Are You?
  4. One Server, One Hundred Websites
By: Harish Kamath
Rating: starstarstarstarstar / 16
March 21, 2005

print this article



In the previous section, I showed you how to configure the error log file generated by the Apache Web server. While this helps a developer during the development and maintenance phases of a project, it may not be very useful to a Web master. The latter wants to analyze the traffic and the nature of visitors that visit his website, not errors. Fortunately, for those requirements, he has the Apache "access" log file.

Now it's time to review the directives that govern these "access" log file(s):

CustomLog logs/access_log common

The syntax of the "CustomLog" directive is similar to that of the "ErrorLog" directive: you have to specify the name and the path (absolute or relative) of the log file. The only difference is the presence of the "common" keyword at the end of the line - this represents the "nickname" for the log entry format that you would like to use.

Yes, you can define your own custom format (as well as "nicknames") using the "LogFormat" directive. There are four pre-defined formats listed in the default configuration file:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

In general, the syntax of the "LogFormat" directive looks something like this:


While the syntax describing the format appears complex at first glance, it will make sense once you understand what each symbol in the format string stands for. Consider the following "LogFormat" entry:

LogFormat "%h %l %u %t \"%r\" %>s %b" common

The keyword "common" represents its nickname and is used in the "CustomLog" directive to refer to this format; we've already seen that above.

Now, let me concentrate the format string itself, where each symbol has a very specific purpose. In order to make things easier, let me paste a sample entry from the my local "access_log" file: - root [18/Dec/2003:12:52:43 +0530] "GET /phpmyadmin/db_details_structure.php?lang=en-iso-8859-1& server=1&db=industry HTTP/1.1" 200 28406

Next, let me map each symbol from the format string to the actual entry in the above log file snippet:

  • The "%h" (value in log file: ) represents the IP address of the client machine (in most cases).

  • The "%l" (value in log file: - ) is replaced by the RFC 1413 identity of the client. However, the official documentation states that this value is "highly unreliable and should almost never be used except on tightly controlled internal networks." Note that a "hyphen," i.e. the "-" symbol is used by the Web server to indicate that it could not retrieve a value for a particular parameter.

  • The "%u" (value in log file: root) indicates the username of the user accessing the Web server using HTTP  authentication. Often, this value is not recorded as most visitors are anonymous to the Web server.

  • The "%t" (value in log file: 18/Dec/2003:12:52:43 +0530) represents that date and time of the request. Note that you can customize the format of the time stamp using the syntax use for the strftime() C function.

  • The "%r" (value in log file: GET /phpmyadmin/db_details_structure.php?lang=en-iso-8859-1&server=1&db=industry HTTP/1.1) is replaced by the actual request URL sent by the client machine. Along with the request method (GET or POST), the entry also lists all the parameters sent in the query string, as seen above, for a GET request.

  • The "%>s" (value in log file: 200) represents the HTTP status code and is very useful to programmers and Web masters. Some of the common values listed in this column are 200 (indicating a successful response), 404 (indicating that the requested file was not found) and 500 (indicating an error occurred during the execution of the requested script).

    You can view a list of all HTTP status code at the following URL: http://www.w3.org/Protocols/rfc2616/rfc2616.txt 

  • The "%b" symbol represents the size of the response in bytes; this gives an indication of the bandwidth used by the website.

Note that if you wish to insert quotes in the log files, you have to escape them in the log format string. For example, the \"%r\" syntax encloses the request URL within quotes in the log file.

And thatís not all - there are many more symbols that you can use in your format string. Here are some important ones:

  • The "%A" symbol will display the local IP address.

  • The "%B" will display the number of bytes sent to the client, excluding the size of the HTTP headers. This is useful if you want to get an accurate picture of the bandwidth used by the different elements of your website such as images, style sheets, and so forth.

  • The "%{VARNAME}e" will list the contents of the "VARNAME " environment variable.

  • The "%f" symbol will display the filename requested by the client.

  • The "%H" will indicate the request protocol.

  • The "%m" symbol will represent the request method.

  • The "%T" will be replaced by the actual time taken by the server to respond to the request.

Finally, there are two more important symbols that I would like to highlight before I move to the next directive:

  • The "%{User-agent}i" symbol is used to store the details of the client accessing the website. This can be used to identify the different browsers that you should support on the basis of the visitors accessing the website.

  • The "%{Referer}i" symbol is used to store the details of the resource that referred the visitor to the current page. Once again, this is an ideal mechanism to study how visitors are redirected to your website.

Finally, there is one more directive that deserves a mention -the "HostnameLookups" directive informs the Web server to attempt to map the IP address of the client to its host name.

HostnameLookups Off

By default, this directive is turned "Off." However, if you turn "On" this directive, the log file should contain human-readable domain names (such as "http://www.kcsonline.biz") instead of the machine-friendly IP addresses (such as ""). There is one caveat that you should keep in mind: the Web server has to make an additional request for every request in order to obtain the hostname, which in turn could slow down the logging process, thereby severely affecting performance.

Before I conclude this section, let me give you a little note on the analysis of the Apache log files: leveraging the popularity of this Open Source Web server, there are a multitude of products that help you to analyze the log files generated by Apache. At one end of the spectrum, you have HTTP-Analyze (http://www.http-analyze.org/) available for free to personal users, and at the other end you have sophisticated (read expensive) tools such as Web Trends (http://www.webtrends.com/). The choice is yours!

>>> More Apache Articles          >>> More By Harish Kamath

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Apache Unveils Cassandra 1.2
- Apache on ARM Chips? Dell and Calxeda Help M...
- The Down Side of Open Source Software
- VMware Unveils Serengeti for Apache Hadoop
- SAP Takes Steps to Improve Hadoop Integration
- Looking to Hone Apache Hadoop Skills?
- How to Install Joomla on WAMPP
- Working with XAMPP and Wordpress
- GUI Available for Apache Camel
- Reduce Server Load for Apache and PHP Websit...
- Creating a VAMP (Vista, Apache, MySQL, PHP) ...
- Putting Apache in Jail
- Containing Intrusions in Apache
- Server Limits for Apache Security
- Setting Permissions in Apache

Developer Shed Affiliates


Dev Shed Tutorial Topics: