An In-Depth Discussion of VirtualHost Matching

This is a very rough document that was probably out of date the moment it was written. It attempts to explain exactly what the code does when deciding what virtual host to serve a hit from. It's provided on the assumption that something is better than nothing. The server version under discussion is Apache 1.2.

If you just want to "make it work" without understanding how, there's a What Works section at the bottom.

Config File Parsing

There is a main_server which consists of all the definitions appearing outside of VirtualHost sections. There are virtual servers, called vhosts, which are defined by VirtualHost sections.

The directives Port, ServerName, ServerPath, and ServerAlias can appear anywhere within the definition of a server. However, each appearance overrides the previous appearance (within that server).

The default value of the Port field for main_server is 80. The main_server has no default ServerName, ServerPath, or ServerAlias.

In the absence of any Listen directives, the (final if there are multiple) Port directive in the main_server indicates which port httpd will listen on.

The Port and ServerName directives for any server main or virtual are used when generating URLs such as during redirects.

Each address appearing in the VirtualHost directive can have an optional port. If the port is unspecified it defaults to the value of the main_server's most recent Port statement. The special port * indicates a wildcard that matches any port. Collectively the entire set of addresses (including multiple A record results from DNS lookups) are called the vhost's address set.

The magic _default_ address has significance during the matching algorithm. It essentially matches any unspecified address.

After parsing the VirtualHost directive, the vhost server is given a default Port equal to the port assigned to the first name in its VirtualHost directive. The complete list of names in the VirtualHost directive are treated just like a ServerAlias (but are not overridden by any ServerAlias statement). Note that subsequent Port statements for this vhost will not affect the ports assigned in the address set.

All vhosts are stored in a list which is in the reverse order that they appeared in the config file. For example, if the config file is:

    <VirtualHost A>
    ...
    </VirtualHost>

    <VirtualHost B>
    ...
    </VirtualHost>

    <VirtualHost C>
    ...
    </VirtualHost>
Then the list will be ordered: main_server, C, B, A. Keep this in mind.

After parsing has completed, the list of servers is scanned, and various merges and default values are set. In particular:

  1. If a vhost has no ServerAdmin, ResourceConfig, AccessConfig, Timeout, KeepAliveTimeout, KeepAlive, MaxKeepAliveRequests, or SendBufferSize directive then the respective value is inherited from the main_server. (That is, inherited from whatever the final setting of that value is in the main_server.)
  2. The "lookup defaults" that define the default directory permissions for a vhost are merged with those of the main server. This includes any per-directory configuration information for any module.
  3. The per-server configs for each module from the main_server are merged into the vhost server.
Essentially, the main_server is treated as "defaults" or a "base" on which to build each vhost. But the positioning of these main_server definitions in the config file is largely irrelevant -- the entire config of the main_server has been parsed when this final merging occurs. So even if a main_server definition appears after a vhost definition it might affect the vhost definition.

If the main_server has no ServerName at this point, then the hostname of the machine that httpd is running on is used instead. We will call the main_server address set those IP addresses returned by a DNS lookup on the ServerName of the main_server.

Now a pass is made through the vhosts to fill in any missing ServerName fields and to classify the vhost as either an IP-based vhost or a name-based vhost. A vhost is considered a name-based vhost if any of its address set overlaps the main_server (the port associated with each address must match the main_server's Port). Otherwise it is considered an IP-based vhost.

For any undefined ServerName fields, a name-based vhost defaults to the address given first in the VirtualHost statement defining the vhost. Any vhost that includes the magic _default_ wildcard is given the same ServerName as the main_server. Otherwise the vhost (which is necessarily an IP-based vhost) is given a ServerName based on the result of a reverse DNS lookup on the first address given in the VirtualHost statement.

Vhost Matching

Apache 1.3 differs from what is documented here, and documentation still has to be written.

The server determines which vhost to use for a request as follows:

find_virtual_server: When the connection is first made by the client, the local IP address (the IP address to which the client connected) is looked up in the server list. A vhost is matched if it is an IP-based vhost, the IP address matches and the port matches (taking into account wildcards).

If no vhosts are matched then the last occurrence, if it appears, of a _default_ address (which if you recall the ordering of the server list mentioned above means that this would be the first occurrence of _default_ in the config file) is matched.

In any event, if nothing above has matched, then the main_server is matched.

The vhost resulting from the above search is stored with data about the connection. We'll call this the connection vhost. The connection vhost is constant over all requests in a particular TCP/IP session -- that is, over all requests in a KeepAlive/persistent session.

For each request made on the connection the following sequence of events further determines the actual vhost that will be used to serve the request.

check_fulluri: If the requestURI is an absoluteURI, that is it includes http://hostname/, then an attempt is made to determine if the hostname's address (and optional port) match that of the connection vhost. If it does then the hostname portion of the URI is saved as the request_hostname. If it does not match, then the URI remains untouched. Note: to achieve this address comparison, the hostname supplied goes through a DNS lookup unless it matches the ServerName or the local IP address of the client's socket.

parse_uri: If the URI begins with a protocol (i.e., http:, ftp:) then the request is considered a proxy request. Note that even though we may have stripped an http://hostname/ in the previous step, this could still be a proxy request.

read_request: If the request does not have a hostname from the earlier step, then any Host: header sent by the client is used as the request hostname.

check_hostalias: If the request now has a hostname, then an attempt is made to match for this hostname. The first step of this match is to compare any port, if one was given in the request, against the Port field of the connection vhost. If there's a mismatch then the vhost used for the request is the connection vhost. (This is a bug, see observations.)

If the port matches, then httpd scans the list of vhosts starting with the next server after the connection vhost. This scan does not stop if there are any matches, it goes through all possible vhosts, and in the end uses the last match it found. The comparisons performed are as follows:

check_serverpath: If the request has no hostname (back up a few paragraphs) then a scan similar to the one in check_hostalias is performed to match any ServerPath directives given in the vhosts. Note that the last match is used regardless (again consider the ordering of the virtual hosts).

Observations

What Works

In addition to the tips on the DNS Issues page, here are some further tips: