HTTP Protocol Compliance

This document describes the mechanism to set a policy for HTTP protocol compliance for a given URL space by the origin servers or applications behind that URL space.

For those who may have received an error message from a rejected policy, and need to know what the policy rejection means and what they might do to fix the error, each policy is described below.

Filters
Enforcing HTTP Protocol Compliance in Apache 2 mod_policy PolicyConditional PolicyLength PolicyKeepalive PolicyType PolicyVary PolicyValidation PolicyNocache PolicyMaxage PolicyVersion

The HTTP protocol follows the robustness principle as described in RFC1122, which states "Be liberal in what you accept, and conservative in what you send". As a result of this principle, HTTP clients will compensate for and recover from incorrect or misconfigured responses, or responses that are uncacheable.

As a website is scaled up to face greater and greater traffic loads, suboptimal or misconfigured applications or server configurations can threaten both the stability and scalability of the website, as well as the hosting costs associated with it. A website can also scale up to face greater configuration complexity, and it can be increasingly difficult to detect and keep track of suboptimally configured URL spaces on a given server.

Eventually a point is reached where the principle "conservative in what you send" needs to be enforced by the server administrator.

The mod_policy module provides a set of filters which can be applied to a server, allowing key features of the HTTP protocol to be explicitly tested, and non compliant responses logged as warnings, or rejected outright as an error. Each filter can be applied separately, allowing the administrator to pick and choose which policies should be enforced depending on the circumstances of their environment.

The filters might be placed in testing and staging environments for the benefit of application and website developers, or may be applied to production servers to protect infrastructure from systems outside the administrator's direct control.

Enforcing HTTP protocol compliance for an application server

In the above example, an Apache httpd server has been placed between the application server and the internet at large, and configured to cache responses from the application server. The mod_policy filters have been added to enforce support for cacheable content and conditional requests, ensuring that both mod_cache and public caches on the internet are fully able to cache content created by the restful application server efficiently.

Enforcing HTTP protocol compliance in a static server

In the above simpler example, a static server serving highly cacheable content has a set of policies applied to ensure that the server configuration conforms to a minimum level of compliance.

Conditional Request Policy mod_policy PolicyConditional

This policy will be rejected if the server does not correctly respond to a conditional request with the appropriate status code.

Conditional requests form the mechanism by which an HTTP cache makes stale content fresh again, and particularly for content with short freshness lifetimes, lack of support for conditional requests can add avoidable load to the server.

Most specifically, the existence of any of following headers in the request makes the request conditional:

If-Match
If the provided ETag in the If-Match header does not match the ETag of the response, the server should return 412 Precondition Failed. Full details of how to handle an If-Match header can be found in RFC2616 section 14.24.
If-None-Match
If the provided ETag in the If-None-Match header matches the ETag of the response, the server should return either 304 Not Modified for GET/HEAD requests, or 412 Precondition Failed for other methods. Full details of how to handle an If-None-Match header can be found in RFC2616 section 14.26.
If-Modified-Since
If the provided date in the If-Modified-Since header is older than the Last-Modified header of the response, the server should return 304 Not Modified. Full details of how to handle an If-Modified-Since header can be found in RFC2616 section 14.25.
If-Unmodified-Since
If the provided date in the If-Modified-Since header is newer than the Last-Modified header of the response, the server should return 412 Precondition Failed. Full details of how to handle an If-Unmodified-Since header can be found in RFC2616 section 14.28.
If-Range
If the provided ETag or date in the If-Range header matches the ETag or Last-Modified of the response, and a valid Range is present, the server should return 206 Partial Response. Full details of how to handle an If-Range header can be found in RFC2616 section 14.27.

If the response is detected to have been successful (a 2xx response), but was conditional and one of the responses above was expected instead, this policy will be rejected. Responses that indicate a redirect or a failure of some kind (3xx, 4xx, 5xx) will be ignored by this policy.

This policy is implemented by the POLICY_CONDITIONAL filter.

Content-Length Policy mod_policy PolicyLength

This policy will be rejected if the server response does not contain an explicit Content-Length header.

There are a number of ways of determining the length of a response body, described in full in RFC2616 section 4.4 Message Length.

When the Content-Length header is present, the size of the body is declared at the start of the response. If this information is missing, an HTTP cache might choose to ignore the response, as it does not know in advance whether the response will fit within the cache's defined limits.

HTTP/1.1 defines the Transfer-Encoding header as an alternative to Content-Length, allowing the end of the response to be indicated to the client without the client having to know the length beforehand. However, when HTTP/1.0 requests are processed, and no Content-Length is specified, the only mechanism available to the server to indicate the end of the request is to drop the connection. In an environment containing load balancers, this can cause the keepalive mechanism to be bypassed.

If the response is detected to have been successful (a 2xx response), and has a response body (this excludes 204 No Content), and the Content-Length header is missing, this policy will be rejected. Responses that indicate a redirect or a failure of some kind (3xx, 4xx, 5xx) will be ignored by this policy.

It should be noted that some modules, such as mod_proxy, add their own Content-Length header should the response be small enough for it to have been possible to read the response lacking such a header in one go. This may cause small responses to pass this policy, while larger responses may fail for the same URL.

This policy is implemented by the POLICY_LENGTH filter.

Content-Type Policy mod_policy PolicyType

This policy will be rejected if the server response does not contain an explicit and syntactically correct Content-Type header that matches the server defined pattern.

The media type of the body is placed in the Content-Type header, and the format of the header is described in full in RFC2616 section 3.7 Media Types.

A syntactically valid content type might look as follows:

Content-Type: text/html; charset=iso-8859-1

Invalid content types might include:

# invalid
Content-Type: foo
# blank
Content-Type:

The server administrator has the option to restrict the policy to one or more specific types, or could specify a general wildcard type such as */*.

This policy is implemented by the POLICY_TYPE filter.

Keepalive Policy mod_policy PolicyKeepalive

This policy will be rejected if the server response does not contain an explicit Content-Length header, or a Transfer-Encoding of chunked.

There are a number of ways of determining the length of a response body, described in full in RFC2616 section 4.4 Message Length.

When the Content-Length header is present, the size of the body is declared at the start of the response. HTTP/1.1 defines the Transfer-Encoding header as an alternative to Content-Length, allowing the end of the response to be indicated to the client without the client having to know the length beforehand. In the absence of these two mechanisms, the only way for a server to indicate the end of the request is to drop the connection. In an environment containing load balancers, this can cause the keepalive mechanism to be bypassed.

Most specifically, we follow these rules:

IF
we have not marked this connection as errored;
and
the client isn't expecting 100-continue
and
the response status does not require a close;
and
the response body has a defined length due to the status code being 304 or 204, the request method being HEAD, already having defined Content-Length or Transfer-Encoding: chunked, or the request version being HTTP/1.1 and thus capable of being set as chunked
THEN
we support keepalive.
The server may choose to turn off keepalive for various reasons, such as an imminent shutdown, or a Connection: close from the client, or an HTTP/1.0 client request with a response with no Content-Length, but for our purposes we only care that keepalive was possible from the application, not that keepalive actually took place.

It should also be noted that the Apache httpd server includes a filter that adds chunked encoding to responses without an explicit content length. This policy catches those cases where this filter is bypassed or not in effect.

This policy is implemented by the POLICY_KEEPALIVE filter.

Freshness Lifetime / Maxage Policy mod_policy PolicyMaxage

This policy will be rejected if the server response does not have an explicit freshness lifetime at least as long as the server defined limit, or if the freshness lifetime is calculated based on a heuristic.

Full details of how a freshness lifetime is calculated is described in full in RFC2616 section 13.2 Expiration Model.

During the freshness lifetime, a cache does not need to contact the origin server at all, it can simply pass the cached content as is back to the client.

When the freshness lifetime is reached, the cache should contact the origin server in an effort to check whether the content is still fresh, and if not, replace the content.

When the freshness lifetime is too short, it can result in excessive load on the server. In addition, should an outage occur that is as long or longer than the freshness lifetime, all cached content will become stale, which could cause a thundering herd of traffic when the server or network returns.

This policy is implemented by the POLICY_MAXAGE filter.

No Cache Policy mod_policy PolicyNocache

This policy will be rejected if the server response declares itself uncacheable using either the Cache-Control or Pragma headers.

Full details of how content may be declared uncacheable is described in full in RFC2616 section 14.9.1 What is Cacheable, and within the definition for the Pragma header in RFC2616 section 14.32 Pragma.

Most specifically, should any of the following header combinations exist in the response headers, the response will be rejected:

When unexpected, uncacheable content may produce unacceptable levels of server load, or may incur significant cost. When this policy is enabled, all server defined uncacheable content will be rejected.

This policy is implemented by the POLICY_NOCACHE filter.

Validation Policy mod_policy PolicyValidation

This policy will be rejected if the server response does not contain either a syntactically correct ETag or Last-Modified header.

The ETag header is described in full in RFC2616 section 14.19 Etag, and the Last-Modified header is described in full in RFC2616 section 14.29 Last-Modified.

In addition to being checked present, the headers are checked for syntax.

An ETag that is not surrounded with quotes, or is not declared "weak" by prefixing it with a "W/" will cause the policy to be rejected. A Last-Modified that is not parsed as a valid date will cause the policy to be rejected.

This policy is implemented by the POLICY_VALIDATION filter.

Vary Header Policy mod_policy PolicyVary

This policy will be rejected if the server response contains a Vary header, and that header in turn contains a header forbidden by the administrator.

The Vary header is described in full in RFC2616 section 14.44 Vary.

Some client provided headers, such as User-Agent, can contain thousands or millions of combinations of values over a period of time, and if the response is declared cacheable, a cache might attempt to cache each of these responses separately, filling up the cache and crowding out other entries in the cache. In this scenario, if so configured, the policy will reject the response.

This policy is implemented by the POLICY_VARY filter.

Protocol Version Policy mod_policy PolicyVersion

This policy will be rejected if the client request was made with a version number lower than the version of HTTP specified.

This policy is typically used with restful applications where control over the type of client is desired. This policy can be used alongside the POLICY_KEEPALIVE filter to ensure that HTTP/1.0 clients don't cause keepalive connections to be dropped.

Possible minimum versions that could be specified are:

This policy is implemented by the POLICY_VERSON filter.