This document describes the mechanism to set a policy for HTTP protocol compliance for a given URL space by the origin servers or applications behind that URL space.
For those who may have received an error message from a rejected policy, and need to know what the policy rejection means and what they might do to fix the error, each policy is described below.
The HTTP protocol follows the robustness principle as described in RFC1122, which states "Be liberal in what you accept, and conservative in what you send". As a result of this principle, HTTP clients will compensate for and recover from incorrect or misconfigured responses, or responses that are uncacheable.
As a website is scaled up to face greater and greater traffic loads, suboptimal or misconfigured applications or server configurations can threaten both the stability and scalability of the website, as well as the hosting costs associated with it. A website can also scale up to face greater configuration complexity, and it can be increasingly difficult to detect and keep track of suboptimally configured URL spaces on a given server.
Eventually a point is reached where the principle "conservative in what you send" needs to be enforced by the server administrator.
The
The filters might be placed in testing and staging environments for the benefit of application and website developers, or may be applied to production servers to protect infrastructure from systems outside the administrator's direct control.
In the above example, an Apache httpd server has been placed between
the application server and the internet at large, and configured to cache
responses from the application server. The
In the above simpler example, a static server serving highly cacheable content has a set of policies applied to ensure that the server configuration conforms to a minimum level of compliance.
This policy will be rejected if the server does not correctly respond to a conditional request with the appropriate status code.
Conditional requests form the mechanism by which an HTTP cache makes stale content fresh again, and particularly for content with short freshness lifetimes, lack of support for conditional requests can add avoidable load to the server.
Most specifically, the existence of any of following headers in the request makes the request conditional:
If-Match
If-Match
header does not match
the ETag of the response, the server should return
412 Precondition Failed
. Full details of how to handle an
If-Match
header can be found in
RFC2616 section 14.24.If-None-Match
If-None-Match
header matches
the ETag of the response, the server should return either
304 Not Modified
for GET/HEAD requests, or
412 Precondition Failed
for other methods. Full details of how
to handle an If-None-Match
header can be found in
RFC2616 section 14.26.If-Modified-Since
If-Modified-Since
header is
older than the Last-Modified
header of the response, the server
should return 304 Not Modified
. Full details of how to handle an
If-Modified-Since
header can be found in
RFC2616 section 14.25.If-Unmodified-Since
If-Modified-Since
header is
newer than the Last-Modified
header of the response, the server
should return 412 Precondition Failed
. Full details of how to
handle an If-Unmodified-Since
header can be found in
RFC2616 section 14.28.If-Range
If-Range
header matches
the ETag or Last-Modified of the response, and a valid Range
is present, the server should return
206 Partial Response
. Full details of how to handle an
If-Range
header can be found in
RFC2616 section 14.27.If the response is detected to have been successful (a 2xx response), but was conditional and one of the responses above was expected instead, this policy will be rejected. Responses that indicate a redirect or a failure of some kind (3xx, 4xx, 5xx) will be ignored by this policy.
This policy is implemented by the POLICY_CONDITIONAL filter.
This policy will be rejected if the server response does not contain
an explicit Content-Length
header.
There are a number of ways of determining the length of a response body, described in full in RFC2616 section 4.4 Message Length.
When the Content-Length
header is present, the size of
the body is declared at the start of the response. If this information
is missing, an HTTP cache might choose to ignore the response, as it
does not know in advance whether the response will fit within the
cache's defined limits.
HTTP/1.1 defines the Transfer-Encoding
header as an
alternative to Content-Length
, allowing the end of the
response to be indicated to the client without the client having to
know the length beforehand. However, when HTTP/1.0 requests are
processed, and no Content-Length
is specified, the only
mechanism available to the server to indicate the end of the request
is to drop the connection. In an environment containing load
balancers, this can cause the keepalive mechanism to be bypassed.
If the response is detected to have been successful (a 2xx response),
and has a response body (this excludes 204 No Content
), and
the Content-Length
header is missing, this policy will be
rejected. Responses that indicate a redirect or a failure of some kind
(3xx, 4xx, 5xx) will be ignored by this policy.
Content-Length
header should the response be small enough for it to have been possible
to read the response lacking such a header in one go. This may cause
small responses to pass this policy, while larger responses may
fail for the same URL.This policy is implemented by the POLICY_LENGTH filter.
This policy will be rejected if the server response does not contain
an explicit and syntactically correct Content-Type
header
that matches the server defined pattern.
The media type of the body is placed in the Content-Type
header, and the format of the header is described in full in
RFC2616 section 3.7 Media Types.
A syntactically valid content type might look as follows:
Invalid content types might include:
The server administrator has the option to restrict the policy to one
or more specific types, or could specify a general wildcard type such as
*/*
.
This policy is implemented by the POLICY_TYPE filter.
This policy will be rejected if the server response does not contain
an explicit Content-Length
header, or a
Transfer-Encoding
of chunked.
There are a number of ways of determining the length of a response body, described in full in RFC2616 section 4.4 Message Length.
When the Content-Length
header is present, the size of
the body is declared at the start of the response. HTTP/1.1 defines the
Transfer-Encoding
header as an alternative to
Content-Length
, allowing the end of the response to be
indicated to the client without the client having to know the length
beforehand. In the absence of these two mechanisms, the only way for
a server to indicate the end of the request is to drop the connection.
In an environment containing load balancers, this can cause the keepalive
mechanism to be bypassed.
Most specifically, we follow these rules:
Content-Length
, but for our purposes we only care that
keepalive was possible from the application, not that keepalive actually
took place.It should also be noted that the Apache httpd server includes a filter that adds chunked encoding to responses without an explicit content length. This policy catches those cases where this filter is bypassed or not in effect.
This policy is implemented by the POLICY_KEEPALIVE filter.
This policy will be rejected if the server response does not have an explicit freshness lifetime at least as long as the server defined limit, or if the freshness lifetime is calculated based on a heuristic.
Full details of how a freshness lifetime is calculated is described in full in RFC2616 section 13.2 Expiration Model.
During the freshness lifetime, a cache does not need to contact the origin server at all, it can simply pass the cached content as is back to the client.
When the freshness lifetime is reached, the cache should contact the origin server in an effort to check whether the content is still fresh, and if not, replace the content.
When the freshness lifetime is too short, it can result in excessive load on the server. In addition, should an outage occur that is as long or longer than the freshness lifetime, all cached content will become stale, which could cause a thundering herd of traffic when the server or network returns.
This policy is implemented by the POLICY_MAXAGE filter.
This policy will be rejected if the server response declares itself
uncacheable using either the Cache-Control
or
Pragma
headers.
Full details of how content may be declared uncacheable is described in
full in
RFC2616 section 14.9.1 What is Cacheable, and within the definition
for the Pragma
header in
RFC2616 section 14.32 Pragma.
Most specifically, should any of the following header combinations exist in the response headers, the response will be rejected:
Cache-Control: no-cache
Cache-Control: no-store
Cache-Control: private
Pragma: no-cache
When unexpected, uncacheable content may produce unacceptable levels of server load, or may incur significant cost. When this policy is enabled, all server defined uncacheable content will be rejected.
This policy is implemented by the POLICY_NOCACHE filter.
This policy will be rejected if the server response does not contain
either a syntactically correct ETag
or
Last-Modified
header.
The ETag
header is described in full in
RFC2616 section 14.19 Etag, and the Last-Modified
header
is described in full in
RFC2616 section 14.29 Last-Modified.
In addition to being checked present, the headers are checked for syntax.
An ETag
that is not surrounded with quotes, or is not
declared "weak" by prefixing it with a "W/" will cause the policy to be
rejected. A Last-Modified
that is not parsed as a valid date
will cause the policy to be rejected.
This policy is implemented by the POLICY_VALIDATION filter.
This policy will be rejected if the server response contains a
Vary
header, and that header in turn contains a header
forbidden by the administrator.
The Vary
header is described in full in
RFC2616 section 14.44 Vary.
Some client provided headers, such as User-Agent
,
can contain thousands or millions of combinations of values over a period
of time, and if the response is declared cacheable, a cache might attempt
to cache each of these responses separately, filling up the cache and
crowding out other entries in the cache. In this scenario, if so
configured, the policy will reject the response.
This policy is implemented by the POLICY_VARY filter.
This policy will be rejected if the client request was made with a version number lower than the version of HTTP specified.
This policy is typically used with restful applications where
control over the type of client is desired. This policy can be used
alongside the POLICY_KEEPALIVE
filter to ensure that
HTTP/1.0 clients don't cause keepalive connections to be dropped.
Possible minimum versions that could be specified are:
HTTP/1.1
HTTP/1.0
HTTP/0.9
This policy is implemented by the POLICY_VERSON filter.