Getting Started with Apache 2.0 Part III

In this third and final installment of our “Getting Started with Apache 2.0″ series, you will learn how to configure the Apache server as a “proxy server” for your local network, how to make use of “URL Re-writing,” and much more.

Welcome to the third and concluding part of the “Getting Started with Apache 2.0″ series.

Before outlining the topics that I’ll cover in the current article, let me quickly review what you learned in the last one. It started with a quick overview of the “main server” configuration section of the “httpd.conf” file. This was followed by an explanation of the log files generated by the Apache Web server and how to customize its entries. The article concluded with a section on how to set up “Virtual Hosts” – a feature that gives you the ability to run multiple websites on a single Apache Web server.

Today, I’ll show you how to configure the Apache server as a “proxy server” for your local network and then, I’ll talk about “URL Re-writing,”  a powerful feature that allows you to “re-write” requests to the Web server in real time. Next, I’ll explain how you can configure “user-specific” directories on your Web server and finally, I’ll wrap this series with introductions to some offbeat Apache modules with which you can experiment.

Let’s get moving, shall we?

Apache As A Proxy Server

So far, I’ve only concentrated on the “Web server” capabilities of Apache. However, you’ll be surprised to learn that your favorite software package can also be configured to run as a “proxy” server on your local network.

For the uninitiated, a “proxy” server – or a “forward proxy server,” to be more specific – is a mechanism (a combination of hardware and/or software) that allows computers (a.k.a. clients) connected to a common network to access the World Wide Web, among other things, using a single connection to the Internet.

On the contrary, a “reverse proxy” server allows an Apache instance to map requests from Internet users to a local “namespace” without the need to configure the clients specifically. Network administrators, typically, configure reverse proxies to provide access to servers placed behind firewalls, to implement load balancing, to enable caching and so on.

Today I will only show you how to set up a “forward proxy server”, but there is no reason to worry – you can learn more about “reverse proxies” by visiting the link provided at the end of this section.

Let’s come back to the “mod_proxy” module. By default, this module is not enabled. Therefore, you’ll have to recompile Apache using the following command:

./configure –prefix=/usr/local/apache –enable-proxy

Under the hood, the “proxy” features of Apache are driven by three different modules that work in tandem with the “mod_proxy” module: they are “mod_proxy_http”, “mod_proxy_ftp” and “mod_proxy_connect.” As the names suggest, the first allows Apache to serve HTTP proxy requests, the second serves FTP requests and the third module allows the server to service SSL requests using the CONNECT HTTP method.

You’ll notice that I’ve listed only one module i.e. mod_proxy while compiling Apache. The reason is simple: all three modules are automatically enabled by the “–enable-proxy” option.

Alternatively, if you have compiled DSO support using the “-enable-so” option, as you would have done if you compiled Apache to work with PHP 5.0 - as outlined in the first part of this series - you can conveniently activate this module at run time using the “LoadModule” Apache directive, and avoid the tedious task of re-compiling the source code.

Once you’ve enabled the “mod_proxy” module, the focus shifts back to the ubiquitous “httpd.conf” configuration file. Add the following lines to the configuration file to set up  Apache as a “forward proxy”:

#
# Configure the proxy module of Apache
#
<IfModule mod_proxy.c>

 ProxyRequests On
 
 <Proxy *>
  Order Deny,Allow
  Deny from all
Allow from 192.168.100.0/255.255.255.0
 </Proxy>

</IfModule>

Restart the Apache Web server in order to allow the directives to take effect. Of course, you’ll also need to configure the computers on the network to use the above machine – the IP Address of the machine running the proxy server should be more than sufficient – when connecting to the Internet.

Now, let me review the directives one-by-one: the “ProxyRequests” directive has to be set to “On” in order to enable Apache to function as a “forward” proxy server. The “Proxy” directive allows you to secure the proxy server by preventing unauthorized access, a recommended practice to prevent misuse. Note the use the wildcard character (*) with the “Proxy” directive in order to match all “proxied” content. Furthermore, this directive can also enclose the “Order,” “Allow” and “Deny” directives in order to control access to the proxy server. Here, I would like to highlight the use of the “192.168.100.0/255.255.255.0″ network/netmask combination, permitting access to specific group of computers on the network.

You can also use the “ProxyBlock” directive to restrict access to a particular URL or URLs that contain specific words. Take a look at the next listing:

#
# Configure the proxy module of Apache
#
<IfModule mod_proxy.c>

 ProxyRequests On
 
 <Proxy *>
  Order Deny,Allow
  Deny from all
Allow from 192.168.100.0/255.255.255.0
 </Proxy>

 # Block “hustler.com”
# and any URL containing the word “xxx” 
ProxyBlock hustler.com xxx

</IfModule>

As mentioned above, this “ProxyBlock” directive allows you to specify a list of hosts, domains and words that will be blocked by the proxy server. The above settings prevents HTTP and FTP access to the domain “hustler.com” as well as to any URL that contains the pattern “xxx”.

Note that the proxy module will attempt to resolve all hosts, specified in the “ProxyBlock” directive, at startup. This exercise could result in a slight delay when Apache starts.

And before you proceed to the next section, don’t forget to review the documentation at http://httpd.apache.org/docs-2.0/mod/mod_proxy.html for information on how to set up a “reverse” proxy.

{mospagebreak title=Personalized Websites}

I’ve already shown you how to host several websites on a single instance of Apache using “Virtual Hosts.” However, this mechanism can be tedious to maintain if a network administrator wishes to allow each user (on the network) to host his/her own website on the Web server. Fortunately, Apache is equipped with the “mod_userdir” module, which allows visitors to access user-specific web sites using a pre-defined syntax.

Consider the following scenario: you have a Web server that can be accessed by pointing a Web browser to “http://www.myfirm.com/”. Now, you would like to provide each user in the office – say Tom, Dick and Harry – with their own custom web sites, accessible using the following URLs – “http://www.myfirm.com/~tom”, “http://www.myfirm.com/~dick” and “http://www.myfirm.com/~harry” respectively.

No sweat – all you need to do is configure the “mod_userdir” module (compiled, by default) with the help of the “UserDir” directive. Take a look:

UserDir /home/*/www

The above directive will translate every request to a user-specific directory to the appropriate location. For example, a request for the file ”http://www.myfirm.com/~harry/photos/index.html” will result in an attempt to retrieve the file located at “/home/harry/www/photos/index.html”, where “/home/harry/” is the default user folder for Harry.

If you’re paranoid about security, you have the option of enabling this feature for specific users only. For example,what if you would like to only allow Dick and Harry to host their websites on the server while preventing access to all other user-specific websites (including that of Tom)? Just implement the following changes:

UserDir disabled
UserDir enabled dick harry
UserDir /home/*/www

For starters, the “disabled” keyword disables access to the “per-user” directory of all users. Subsequently, the “enabled” keyword allows you to list the users for whom you would like to enable this feature.

Note that there are several security concerns that you need to address if you enable this feature. Therefore, it would be wise to review the following tutorial (part of the official Apache documentation) before you proceed any further:  http://httpd.apache.org/docs-2.0/howto/public_html.html.

{mospagebreak title=URL Rewriting}

The following comment by Brian Behlendorf,  a member of the Apache Group, sums up the power and complexity of the next module that I’m going to talk about: ”The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail.” It’s time to say hello to the “mod_rewrite” module!

Originally invented by Ralf  S. Engelschall in 1996, the “URL Rewriting” module allows you to “rewrite” a URL using a “rule-based” engine. To be more specific, each set of rules is associated with a list of conditions in the configuration file. The “rewrite” engine reads each combination one by one, and if a particular condition is satisfied, the engine implements the associated rule. Note that the rules and conditions can consist of server and environment variables, timestamps, complex regular expressions and much more.

At the onset, I’ll admit that the examples listed below are similar to the ones listed in the “URL Rewriting Guide” by Ralf S. Engelschall (the inventor of the module) and hosted at http://httpd.apache.org/docs-2.0/misc/rewriteguide.html.
 
With that little caveat out of the way, I would like to highlight that the “mod_rewrite” module is not enabled by default. You’ll have to compile it into the Web server using the “–enable-rewrite” compilation option or load it at run-time as a DSO module. Next, you must enable the “rewrite” engine by adding the following directive to the Apache configuration file:

RewriteEngine On

Now, consider the example of  a rogue website that displays image files hosted on your Web server. Not only does the bandwidth utilized (for displaying the image off your Web server) count towards your monthly quota, but you also must grapple with a violation of your intellectual property. No can do!

How can “mod_rewrite” help you out? Add the following lines to your configuration file and re-start the server:

# Images may not be included from external sites
RewriteCond %{HTTP_REFERER}     !^$
RewriteCond %{HTTP_REFERER}     !^http://www.mysite.com/.*$ [NC]
RewriteRule .*.jpg$    -        [F]

Now, any request to an image file (with a “.jpg” extension) should result in a “403 – Forbidden Request” response to the rogue server. Of course, requests from your own website, accessible at “http://www.mysite.com“, will continue to be served without any hassle.

It’s time to decipher the magic behind these new directives. I’ve already indicated that you need to specify a set of conditions (using the “RewriteCond” directive) and rules (using the “RewriteRule” directive) for each requirement.

In the above example, the two “RewriteCond” directive statements test the value stored in the “HTTP_REFERER” server variable against two regex patterns – one matches an empty string and the other matches the URL of your website. The “[NC]” special flag at the end of the second statement informs the engine to carry out a case-insensitive regex pattern match.

The bottom line: with any request that does not have blank referrer or is not a request from your own website, the associated “rule” is enforced, which brings us to the “RewriteRule” directive.

Once again, a simple regular expression, which matches the request strings that end with “.jpg”, does the job. Note the use of the “[F]” flag, which informs the Web server to send a “403 – Forbidden Request” response to the client for all such requests.

You can also use “URL Rewriting” to serve different versions of a page depending on the local time on the Web server. In the example demonstrated below, for all requests to the file “weather.html”, the server will display “weather.day.html” between 5 a.m. and 8 p.m. and “weather.night.html” between 8 p.m. and 5 a.m.

RewriteCond   %{TIME_HOUR}%{TIME_MIN} >0500
RewriteCond   %{TIME_HOUR}%{TIME_MIN} <2000
RewriteRule   ^weather.html$  weather.day.html
RewriteRule   ^weather.html$  weather.night.html

To be frank, I’ve only covered the tip of the iceberg – the “URL Rewriting” module is much, much more resourceful than indicated by the examples listed above. I’ll leave it to your sense of adventure and discovery to unearth its true potential!

{mospagebreak title=Et Cetera}

In this penultimate section of this article, I’ll introduce some offbeat Apache Web server modules.

Let me start with the “mod_usertrack” module that allows you to track the navigation of the user across the website by setting a “cookie” on the client. Note that you have to compile this module into Apache by specifying the “–enable-usertrack” option at compilation time.

Now, let me outline the directives that govern the behavior of this module:

CookieTracking On
CookieName MySite
CookieExpires “1 month”
CookieStyle RFC2965
CookieDomain .mysite.com

For starters, the “CookieTracking” directive must be set to “On” in order to activate the module, because compiling the “mod_usertrack” module into the binary does not activate this feature. The following directives control the creation of cookies on the client:

  • CookieName: allows you to specify a name.  The default value is “Apache.”

  • CookieExpires: stores the duration when the cookie should expire.

  • CookieStyle: the format of the cookie header field. Valid values include “Netscape,” “RFC2109″ or “RFC2965.”

  • CookieDomain: the domain associated with the cookie. By default, no domain is associated with the cookie if this directive is absent in the configuration file. Note that value must always begin with a “dot”.

Next, you have the “mod_speling” module – yes, I’ve typed the name of the module correctly here – that allows the Apache Web server to correct any spelling mistakes (only one per request) in the URL. Once again, the “–enable-speling” option ensures that this module is compiled into Apache, statically.

There is only important directive that you need to keep in mind:

# Spell Check module – turned on
CheckSpelling On

Note that the server will attempt to locate file with only a single misspelling. If there is more than one result that match the required criteria, the following output is displayed in the browser:

Multiple Choices

The document name you requested (/tes.html) could not be found on this server. However, we found documents with names similar to the one you requested.
Available documents:
· /test.html (character missing)
· /tesr.html (character missing)

Finally, there is the ”mod_info” module that allows you to learn more about the configuration of the Web server. Once again, you’ll have to include the “mod_info” module in the Apache executable by specifying the “–enable-info” option to the “configure” command.

Next, you must update the “httpd.conf” file with the following entries:

<Location /server-info>
 SetHandler server-info
 Order Deny,Allow
 Deny from all
Allow from 192.168.0.1
</Location>

Restart the Web server and try to access the following URL: “http://www.mysite.com/server-info“.

Note that access is restricted to the computer, whose IP address is “192.168.0.1″, as a security precaution.

A counterpart of the “mod_info” module is the “mod_status” module – compiled into Apache by default. You can learn more about this module at the following URL: http://httpd.apache.org/docs-2.0/mod/mod_status.html.

In fact, you can read up on all modules explained during the course of this three part series by pointing your browser to the following URL: http://httpd.apache.org/docs-2.0/mod/.

Au Revoir

That’s about it for this series on “Getting Started with Apache 2.0.” If you’re interested in learning more about latest (and most stable) incarnation of Apache, take a look at the following links:

The Apache Group Website: http://www.apache.org

The Apache Project Website: http://httpd.apache.org

Apache HTTP Server 2.0 Online Documentation: http://httpd.apache.org/docs-2.0/

New features with Apache 2.0: http://httpd.apache.org/docs-2.0/new_features_2_0.html

Listing of binary distributions: http://apache.gr-linux.com/httpd/binaries/

Apache 2.0: The Internals of the New, Improved – http://www.linuxjournal.com/article/4559

An Amble Through Apache Configuration -  http://www.onlamp.com/pub/a/apache/2000/03/02/
configuring_apache.html

Till next time, have fun!

Note: All examples in this article have been tested on Linux/i586 with Apache 2.0.52, MySQL 3.23 and PHP 5.0.3. Examples are illustrative only, and are definitely NOT meant for a production environment.

Google+ Comments

Google+ Comments