In addition to its normal duties, Apache is also capable of operating as a proxy, either specifically, or combined with serving normal web sites from the local server.
Proxies are intermediate servers that stand between a client and a remote server and makes requests to the remote server on behalf of the client. The objective is twofold: First, a caching proxy can make a record of a suitable document so that next time a client asks for it the proxy can deliver it from the cache without contacting the remote server; Second, a proxy allows clients and servers to be logically isolated from each other, so security can be placed between them to ensure no unauthorized transactions can take place.
In this section, we concentrate on Apache's proxy-related features, before going on to discuss caching and more developed examples in the next section.Installing and Enabling Proxy Services
Apache's proxy functionality is encapsulated in mod_proxy, an optional module supplied as standard with Apache. This primarily implements HTTP/1.0-style proxying but has recently gained some HTTP/1.1 features such as support for Via headers. To enable it, either recompile the server statically or compile it as a dynamic module and include it into the server configuration as described in Chapter 3. Note that the dynamic proxy module is called libproxy.so not mod_proxy.so.
Once installed, proxy operation is simply enabled by specifying the ProxyRequests directive:
We can also switch off proxy services again with:
This directive can go only in the server-level configuration or, more commonly, in a virtual host. However we can configure proxy behavior based on the requested URL using a Directory tag.Normal Proxy Operation
Proxies on a network can work in two directions, forward and reverse, and may also operate in both modes at once.
A forward proxy relays requests from clients on the local network and caches pages from other sites on the Internet, reducing the amount of data transferred on external links; this is a popular application for companies that need to make efficient use of their external bandwidth.
A reverse proxy relays requests from clients outside the local network and caches pages from the local web sites, reducing the load on the servers.
When a client is configured to use a proxy to fetch a remote HTTP or FTP URL, it contacts the proxy, giving the complete URL, including the protocol and remote domain name. The proxy server then checks to see if it is allowed to relay this request and, if so, fetches the remote URL on behalf of the client and returns it. If the proxy is set up to cache documents and the document is cacheable, it also stores it for future requests.
A proxy server with dual network interfaces makes a very effective firewall; external clients connect to one port, and internal clients to the other. The proxy relays requests in and out according to its configuration and deals with all connection requests. Since there are no direct connections between the internal network and the rest of the world, security is much improved.Configuring Apache as a Proxy
In order for Apache to function as a proxy, the only required directive is ProxyRequests, which enables Apache for both forward and reverse proxying - it makes no distinction about whether the client or remote server are internal or external since Apache is not aware of the network topology.
Once proxying is enabled, requests for URLs that the Apache server is locally responsible for are served as normal, but requests for URLs on hosts that do not match any of the hosts that are running on that server cause Apache to attempt to retrieve the URL itself as a client and pass the response back to the client.
Rather bizarrely, we can test a proxy server is working by proxying it with a web site served by the same Apache server. Because the server will serve its own content directly, we have to put the proxy on a different port number - say 8080:
# dynamic servers load modules here...
If we test this configuration without telling the client to use the proxy and ask for http://www.alpha-complex.com/, we get the standard home page as expected and a line in the access log main_log that looks like this:
127.0.0.1--[27/Aug/1999:17:09:30 +0100] "GET http://www.alpha-complex.com/" 200 1030
If we now configure the client to use www.alpha-complex.com, port 8080 as a proxy server, we get the same line in main_log:
127.0.0.1--[27/Aug/1999:17:50:21 +0100] "GET / HTTP/1.0" 200 103
followed almost immediately by a line in the proxy log:
127.0.0.1--[27/Aug/1999:17:50:21 +0100] "GET http://www.alpha-complex.com:8080/" 200 103
What has happened here is that the proxy has received the request on port 4444, stripped out the domain name, and issued a forward HTTP request to that domain on port 80, the default port for HTTP requests. The main server gets this request and responds to it, returning the index page to the proxy which then returns it to the client.
From this it might appear that enabling proxy functionality in a virtual host overrides the default behavior which would be to serve the page directly, since the virtual host inherits the DocumentRoot directive from the main server. If the ProxyRequests directive were not present this is what we would expect to happen. However, the truth is a little more involved. If we ask for the URL http://www.alpha-complex.com:8080/, we get the index page, served directly by the virtual host, without the proxy. If we look in the proxy_log file we see:
"GET http://www.alpha-complex.com:8080/" 200 103
But no corresponding line in main_log, indicating that the proxy server actually served the page directly. Why is this? Simple, if we remember how Apache matches URLs to virtual hosts. The virtual host inherits the settings of the main server, so the actual configuration of the proxy looks like this:
The Listen directives are not inherited, since they are not valid in containers. The User and Group directives are only inherited if suEXEC is in use. Otherwise, they have no effect.
When we configured our client to use the proxy and asked for the URL without a port number, the virtual host received the request but was unable to satisfy it, because the default http port is 80, not 8080. It therefore could not satisfy the request itself and had to use the proxy functionality to make a request for http://www.alpha-complex.com on port 80. This request is picked up by the server but no longer matches the virtual host on port 8080, and so is received by the main server, which satisfies the request. The response is then sent out by Apache in the guise of the main server, back to itself in the guise of the virtual host, which then returns the page to the client.
However, when we asked for the index page on port 8080, the virtual host could satisfy that request because it can receive requests made for port 8080. It has a valid DocumentRoot directive, so it serves the page directly to the client without forwarding the request itself.
Note that if we put a ProxyRequests on directive into the server-level configuration, every virtual host becomes a proxy server and will happily serve proxy requests for any URL it can't satisfy itself. This is interesting, but not necessarily useful behavior. To make a proxy available only when and how we want it, we can customize the scope and operation of the proxy with both Directory and VirtualHost containers.URL Matching with Directory Containers
As mentioned previously, when a client is configured to use a server as a proxy, it sends the server a URL request including the protocol and domain name (or IP address) of the document it desires.
Apache defines a special variant of the Directory container to allow proxy servers to be configured conditionally based on the URL using the prefix proxy: in the directory specification. Just as with normal Directory containers, the actual URL can be wildcarded, so the simplest container can match all possible proxy requests with:
... directives for proxy requests only ...
With this directive present, ordinary URL requests will be served by the main site, whereas proxy requests will be served according to the configuration inside the Directory container. This allows us to insert host or user authentication schemes that only apply when the server is used as a proxy, as opposed to a normal web server.
We can also be more specific. The proxy module by default proxies HTTP, FTP, and Secure HTTP (SSL) connections, which correspond to the protocol identifiers http:, ftp:, and https:. We can therefore define protocol specific directory containers on the lines of:
... proxy directives for http ...
... proxy directives for ftp ...
We can extend the URL in the container as far as we like to match specific hosts or wildcarded URLs:
... proxy directives for www.alpha-complex.com ...
When a client makes a request by any protocol to www.alpha-complex.com, the directives in this container are applied to the request; we can put proxy cache directives here, allow and deny directives to control access, and so on. Here's a complete virtual host definition with host-based access control:
# limit use of this proxy to hosts on the local network
deny from all
allow from 204.148.170
We've added a CacheRoot directive to implement a cache. We'd normally want to specify a few more directives than this, as we will see in the next section, but this will work. We've also added a directory container allowing the use of this proxy by hosts on the local network only; this makes the proxy available for forward proxying but barred from performing reverse proxying - external sites cannot use it as a proxy for www.alpha-complex.comBlocking Sites via the Proxy
It is frequently desirable to prevent a proxy from relaying requests to certain remote servers; this is especially true for proxies that are primarily designed to cache pages for rapid access. We can block access to sites with the ProxyBlock directive; for example:
ProxyBlock www.badsite.com baddomain.dom badword
This directive causes the proxy to refuse to retrieve URLs from hosts with names that contain any of these text elements. In addition, when Apache starts it tries out each parameter in the list with DNS to see if it resolves to an IP address; if so, the IP address is also blocked.
Note this is not the directive to use to counter the effects of a ProxyRemote directive, so a server will satisfy requests to hosts it serves itself rather than forward them to the remote proxy - for that, use NoProxy.Localizing Remote URLs and Hiding Servers from View
Rather than simply passing on URLs for destinations that are not resolvable locally, a server can also map the contents of a remote site into a local URL using the ProxyPass directive. Unlike all the other directives of mod_proxy, this works even for hosts that are not proxy servers and does not require that Proxyrequests has been set to on.
For example, suppose we had three internal servers www.alpha-complex.com, users.alpha-complex.com, and secure.alpha-complex.com. Instead of allowing access to all three, we could map the users and secure web sites so they appear to be part of the main web site by adding these two directives to the configuration for www.alpha-complex.com:
ProxyPass /users/ http://users.alpha-complex.com/
ProxyPass /secure/ http://secure.alpha-complex.com/secure-part/
As mentioned above, we don't need to specify ProxyRequests on for this to work.
We can also create what looks like a real web site, but is in fact just a proxy by mapping the URL /. This allows us to hide a real web site behind a proxy firewall without external users being aware of any unusual activity:
ProxyPass / http://realwww.intranet.alpha-complex.com
In order for this subterfuge to work, we also have to take care of redirections that the internal server realwww.intranet.alpha-complex.com might send in response to the client request.
Without intervention, this may pass the real name of the internal server to the client, causing the proxy to be bypassed or the request to simply fail in the case of a firewall. Fortunately, we can use ProxyPassReverse, which rewrites the Location: header of a redirection received from the internal host so it matches the proxy rather than the internal server. The rewritten response then goes to the client, which is none the wiser.
ProxyPassReverse takes exactly the same arguments as the ProxyPass directive it parallels:
ProxyPass / http://realwww.intranet.alpha-complex.com
ProxyPassReverse / http://realwww.intranet.alpha-complex.com
In general, wherever we put a ProxyPass directive, we probably want to put a ProxyPassReverse directive, too.
This feature is intended primarily for reverse proxies where external clients are asking for documents on local servers. It is unlikely to be useful for forward proxying scenarios.Redirecting Requests to Remote Proxy
Rather than satisfy all proxy requests itself, a proxy server can be configured to use other proxies with the ProxyRemote directive, making use of already cached information, rather than contacting the destination server directly. ProxyRemote takes two parameters: a URL prefix and a remote proxy to contact when the requested URL matches that prefix. For example:
This causes any request URL that starts with http://www.mainsite.com to be forwarded to a mirror site on port 8080 instead. The URL prefix can be as short as we like, so we can instead proxy all HTTP requests with:
ProxyRemote http http://http.proxy.remote.com
We can also proxy ftp in the same way (assuming the proxy server is listening on port 21, the ftp port):
ProxyRemote ftp ftp://ftp.ftpmirror.com
Alternatively, we can encapsulate FTP requests in HTTP messages with:
ProxyRemote ftp http://http.ftpmirror.com
Finally, we can just redirect all requests to a remote proxy with a special wildcard symbol:
ProxyRemote * http://proxy.remote.com
It is possible to specify several ProxyRemote directives, in which case Apache will run through them in turn until it reaches a match. More specific remote proxies must therefore be listed first to avoid being overridden by more general ones:
ProxyRemote http http://http.proxy.remote.com
ProxyRemote * http://other.proxy.remote.com
Note that the only way to override a ProxyRemote once it is set is via the NoProxy directive. This is useful for enabling local clients to access local web sites on proxy servers; the proxy will satisfy the request locally rather than automatically ask the remote proxy - see "Proxies and Intranets" later in the chapter.Proxy Chains and the Via: header
HTTP/1.1 defines the Via: header, which proxy servers automatically add to returned documents en route from the remote destination to the client that requested them. A client that asks for a document that passes through proxies A, B, and C thus returns with Via: headers for C, B, and A, in that order.
Some clients can choke on Via: headers, however, and there are sometimes reasons to disguise the presence of a proxy - security being one of them. For this reason, Apache allows us to control how Via: headers are processed by proxy servers with the ProxyVia directive, which takes one of four parameters:
Note that the default setting of ProxyVia is off, so a proxy will not add a Via: header unless we specifically ask it to.
ProxyVia is occasionally confused with the ProxyRemote directive - although its name suggests that ProxyVia has something to do with relaying requests onward, that job is actually performed by ProxyRemote.Proxies and Intranets
Defining remote proxies is useful for processing external requests, but presents a problem when it comes to serving documents from local servers to local clients. Making the request via an external proxy is at best unnecessary and time consuming, and at worst will cause a request to fail entirely if the proxy server is set up on a firewall that denies the remote proxy access to the internal site.
We can disable proxying for particular hosts or domains with the NoProxy directive to enable a list of whole or partial domain names and whole or partial IP addresses to be served locally. For example, if we wanted to use our web server as a forward proxy for internal clients but still allow web servers on the local 204.148 network, we could specify the following directives:
ProxyRemote * http://proxy.remoteserver.com:8080
This causes the server to act as a proxy for requests to all hosts outside the local network and relay all such requests to proxy.remoteserver.com. Local hosts, including virtual hosts on the web server itself, are served directly, without consulting the remote proxies.
NoProxy also accepts whole or partial hostnames and a bitmask for subnets, so the following are all valid:
NoProxy 188.8.131.52/16 internal.alpha-complex.com intranet.net
A related problem comes from the fact that clients on a local network don't need to fully qualify the name of the server they want if it is in the same domain, i.e., instead of a URL of http://www.alpha-complex.com, they can put http//www. This can cause problems for proxies, since the shortened name will not match parameters in other Proxy directives like ProxyPass or NoProxy. To fix this, the proxy can be told to append a domain name to incomplete host names with ProxyDomain, as shown in the example above. Since the specified domain is literally appended, it is important to include a dot at the start:
When a client receives a server-generated document like an error message after making a request through a proxy (or chain of proxies), it is not always clear whether the remote server or a proxy generated the document. To help clarify this, Apache provides the core directive ServerSignature, which is allowed in any scope and generates a footer line with details of the server. This footer is appended to any document generated by the proxy server. The directive takes one of three parameters:
For example, to generate a full footer line with anadministratorís email address, we would put:
Now error documents generated by the proxy itself have a line appended identifying the proxy as the source of the error, while documents retrieved from the remote server (be they server generated or otherwise) are passed through as is.
This directive is not technically proxy-related, since it can be used by non-proxy servers, too, however its primary application is in proxy configurations.Tunneling Other Protocols
Proxying is mainly directed towards the HTTP and FTP protocols, and either http: or ftp: URLs can be specified for directives that use URLs as arguments. In addition, mod_proxy will also accept HTTP CONNECT requests from clients that wish to connect a remote server via a protocol other than HTTP or FTP.
When the proxy receives a CONNECT request, it compares the port used to a list of allowed ports. If the port is allowed, the proxy makes a connection to the remote server specified on the same port number and maintains the connection to both remote server and client, relaying data, until one side or the other closes their link.
By default, Apache accepts CONNECT requests on ports 443 (https) and 563 (snews). These ports can be overridden with the AcceptConnect directive, which takes a list of port numbers as a parameter. For example, Apache can be told to proxy https and telnet connections by specifying port 23, the telnet port, and port 443:
AllowCONNECT 443 23
A CONNECT request from a client that uses a telnet: or https: URL will then be proxies. To test a telnet proxy, we can go to the command line and telnet to the proxy:
telnet proxy.alpha-complex.com 8080
Then enter a CONNECT request for a host:
CONNECT remote.host:23 HTTP/1.0
And press Return twice.
If the proxy allows the request, the remote host will be contacted on port 23 and a telnet session started, producing a login prompt.Tuning Proxy Operations
The ProxyReceiveBufferSize directive specifies a network buffer size for HTTP and FTP transactions and takes a number of bytes as a parameter. If defined, it has to be greater than 512 bytes; for example:
If a buffer size of zero is specified, Apache uses the default buffer size of the operating system. Adjusting the value of ProxyReceiveBuffer size may improve (or worsen) the performance of the proxy.
mod_proxy also defines a number of directives to control how, where, and for how log documents are cached, and we'll discuss these in the next section.Squid - A High-Performance Proxy Alternative
Apache's mod_proxy is adequate for small-to-medium web sites, but for more intensive duty, it's performance is lacking. An alternative proxy server is Squid, which is specifically designed to handle multiple requests and high loads.
As well as HTTP, it also handles and caches FTP, GOPHER, WAIS and SSL requests, and runs on AIX, Digital UNIX, FreeBSD, HP-UX, Irix, Linux, NetBSD, Nextstep, SCO, and Solaris - but not Windows or Macintosh.
Squid is open source and freely available from http://squid.nlanr.net, which also contains support documentation, a user guide and FAQ, and the Squid mailing list archives.
©1999 Wrox Press Limited, US and UK.
blog comments powered by Disqus