diff options
author | pcs <pcs@unknown> | 1996-12-20 17:13:14 +0100 |
---|---|---|
committer | pcs <pcs@unknown> | 1996-12-20 17:13:14 +0100 |
commit | 5b450d9ac1df19721987d4e08123120b42b52831 (patch) | |
tree | 901b79d0a6226ec33c070939a847aa07cdbe40dc /docs/manual/content-negotiation.html | |
parent | Document the change in SCRIPT_NAME/PATH_INFO behavior, and the addition of (diff) | |
download | apache2-5b450d9ac1df19721987d4e08123120b42b52831.tar.xz apache2-5b450d9ac1df19721987d4e08123120b42b52831.zip |
Expand documentation of content negotiation for Apache 1.2 including
HTTP/1.1 stuff. Document the algorithm apache uses to choose a variant.
git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@77295 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'docs/manual/content-negotiation.html')
-rw-r--r-- | docs/manual/content-negotiation.html | 403 |
1 files changed, 305 insertions, 98 deletions
diff --git a/docs/manual/content-negotiation.html b/docs/manual/content-negotiation.html index 4947e3a6a6..dd3962b797 100644 --- a/docs/manual/content-negotiation.html +++ b/docs/manual/content-negotiation.html @@ -1,57 +1,96 @@ -<html> -<head> -<title>Apache server Content arbitration: MultiViews and *.var files</title> -</head> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> +<HTML> +<HEAD> +<TITLE>Apache Content Negotiation</TITLE> +</HEAD> -<body> +<BODY> <!--#include virtual="header.html" --> -<h1>Content Arbitration: MultiViews and *.var files</h1> - -The HTTP standard allows clients (i.e., browsers like Mosaic or -Netscape) to specify what data formats they are prepared to accept. -The intention is that when information is available in multiple -variants (e.g., in different data formats), servers can use this -information to decide which variant to send. This feature has been -supported in the CERN server for a while, and while it is not yet -supported in the NCSA server, it is likely to assume a new importance -in light of the emergence of HTML3 capable browsers. <p> - -The Apache module <A HREF="mod/mod_negotiation.html">mod_negotiation</A> handles -content negotiation in two different ways; special treatment for the -pseudo-mime-type <code>application/x-type-map</code>, and the -MultiViews per-directory Option (which can be set in srm.conf, or in -.htaccess files, as usual). These features are alternate user -interfaces to what amounts to the same piece of code (in the new file -<code>http_mime_db.c</code>) which implements the content negotiation -portion of the HTTP protocol. <p> - -Each of these features allows one of several files to satisfy a -request, based on what the client says it's willing to accept; the -differences are in the way the files are identified: +<h1>Content Negotiation</h1> + +Apache's support for content negotiation has been updated to meet the +HTTP/1.1 specification. It can choose the best representation of a +resource based on the browser-supplied preferences for media type, +languages, character set and encoding. It is also implements a +couple of features to give more intelligent handling of requests from +browsers which send incomplete negotiation information. <p> + +Content negotiation is provided by the +<a href="mod/mod_negotiation.html">mod_negotiation</a> module, +which is compiled in by default. + +<hr> + +<h2>About Content Negotiation</h2> + +A resource may be available in several different representations. For +example, it might be available in different languages or different +media types, or a combination. One way of selecting the most +appropriate choice is to give the user an index page, and let them +select. However it is often possible for the server to choose +automatically. This works because browsers can send as part of each +request information about what representations they prefer. For +example, a browser could indicate that it would like to see +information in French, if possible, else English will do. Browsers +indicate their preferences by headers in the request. To request only +French representations, the browser would send + +<pre> + Accept-Language: fr +</pre> + +Note that this preference will only be applied when there is a choice +of representations and they vary by language. +<p> + +As an example of a more complex request, this browser has been +configured to accept French and English, but prefer French, and to +accept various media types, preferring HTML over plain text or other +text types, and prefering GIF or jpeg over other media types, but also +allowing any other media type as a last resort: + +<pre> + Accept-Language: fr; q=1.0, en; q=0.5 + Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6, + image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1 +</pre> + +Apache 1.2 supports 'server driven' content negotiation, as defined in +the HTTP/1.1 specification. It fully supports the Accept, +Accept-Language, Accept-Charset and Accept-Encoding request headers. +<p> + +The terms used in content negotiation are: a <b>resource</b> is an +item which can be requested of a server, which might be selected as +the result of a content negotiation algorithm. If a resource is +available in several formats, these are called <b>representations</b> +or <b>variants</b>. The ways in which the variants for a particular +resource vary are called the <b>dimensions</b> of negotiation. + +<h2>Negotiation in Apache</h2> + +In order to negotiate a resource, the server needs to be given +information about each of the variants. This is done in one of two +ways: <ul> - <li> A type map (i.e., a <code>*.var</code> file) names the files - containing the variants explicitly - <li> In a MultiViews search, the server does an implicit filename - pattern match, and chooses from among the results. + <li> Using a type map (i.e., a <code>*.var</code> file) which + names the files containing the variants explicitly + <li> Or using a 'MultiViews' search, where the server does an implicit + filename pattern match, and chooses from among the results. </ul> -Apache also supports a new pseudo-MIME type, -text/x-server-parsed-html3, which is treated as text/html;level=3 -for purposes of content negotiation, and as server-side-included HTML -elsewhere. +<h3>Using a type-map file</h3> -<h3>Type maps (*.var files)</h3> - -A type map is a document which is typed by the server (using its -normal suffix-based mechanisms) as -<code>application/x-type-map</code>. Note that to use this feature, -you've got to have an <code>AddType</code> some place which defines a -file suffix as <code>application/x-type-map</code>; the easiest thing -may be to stick a +A type map is a document which is associated with the handler +named <code>type-map</code> (or, for backwards-compatibility with +older Apache configurations, the mime type +<code>application/x-type-map</code>). Note that to use this feature, +you've got to have an <code>SetHanlder</code> some place which defines a +file suffix as <code>type-map</code>; this is best done with a <pre> - AddType application/x-type-map var + AddHandler type-map var </pre> in <code>srm.conf</code>. See comments in the sample config files for @@ -61,25 +100,27 @@ Type map files have an entry for each available variant; these entries consist of contiguous RFC822-format header lines. Entries for different variants are separated by blank lines. Blank lines are illegal within an entry. It is conventional to begin a map file with -an entry for the combined entity as a whole, e.g., +an entry for the combined entity as a whole (although this +is not required, and if present will be ignored). An example +map file is: <pre> - URI: foo; vary="type,language" + URI: foo URI: foo.en.html - Content-type: text/html; level=2 + Content-type: text/html Content-language: en - URI: foo.fr.html - Content-type: text/html; level=2 - Content-language: fr - + URI: foo.fr.de.html + Content-type: text/html; charset=iso-8859-2 + Content-language: fr, de </pre> -If the variants have different qualities, that may be indicated by the -"qs" parameter, as in this picture (available as jpeg, gif, or ASCII-art): -<pre> - URI: foo; vary="type,language" +If the variants have different source qualities, that may be indicated +by the "qs" parameter to the media type, as in this picture (available +as jpeg, gif, or ASCII-art): +<pre> + URI: foo URI: foo.jpeg Content-type: image/jpeg; qs=0.8 @@ -90,7 +131,12 @@ If the variants have different qualities, that may be indicated by the URI: foo.txt Content-type: text/plain; qs=0.01 -</pre><p> +</pre> +<p> + +qs values can vary between 0.000 and 1.000. Note that any variant with +a qs value of 0.000 will never be chosen. Variants with no 'qs' +parameter value are given a qs factor of 1.0. <p> The full list of headers recognized is: @@ -103,12 +149,12 @@ The full list of headers recognized is: client would be granted access if they were to be requested directly. <dt> <code>Content-type:</code> - <dd> media type --- level may be specified, along with "qs". These + <dd> media type --- charset, level and "qs" parameters may be given. These are often referred to as MIME types; typical media types are <code>image/gif</code>, <code>text/plain</code>, or <code>text/html; level=3</code>. <dt> <code>Content-language:</code> - <dd> The language of the variant, specified as an Internet standard + <dd> The languages of the variant, specified as an internet standard language code (e.g., <code>en</code> for English, <code>kr</code> for Korean, etc.). <dt> <code>Content-encoding:</code> @@ -139,7 +185,7 @@ have to ask for it by name. (Fixing this is a one-line change to The effect of <code>MultiViews</code> is as follows: if the server receives a request for <code>/some/dir/foo</code>, if <code>/some/dir</code> has <code>MultiViews</code> enabled, and -<code>/some/dir/foo</code> does *not* exist, then the server reads the +<code>/some/dir/foo</code> does <em>not</em> exist, then the server reads the directory looking for files named foo.*, and effectively fakes up a type map which names all those files, assigning them the same media types and content-encodings it would have if the client had asked for @@ -161,53 +207,214 @@ present, and <code>index.cgi</code> is there, the server will run it. <p> -If one of the files found by the globbing is a CGI script, it's not -obvious what should happen. My code gives that case gets special -treatment --- if the request was a POST, or a GET with QUERY_ARGS or -PATH_INFO, the script is given an extremely high quality rating, and -generally invoked; otherwise it is given an extremely low quality -rating, which generally causes one of the other views (if any) to be -retrieved. This is the only jiggering of quality ratings done by the -MultiViews code; aside from that, all Qualities in the synthesized -type maps are 1.0. +If one of the files found when reading the directive is a CGI script, +it's not obvious what should happen. The code gives that case +special treatment --- if the request was a POST, or a GET with +QUERY_ARGS or PATH_INFO, the script is given an extremely high quality +rating, and generally invoked; otherwise it is given an extremely low +quality rating, which generally causes one of the other views (if any) +to be retrieved. + +<h2>The Negotiation Algorithm</h2> + +After Apache has obtained a list of the variants for a given resource, +either from a type-map file or from the filenames in the directory, it +applies a algorithm to decide on the 'best' variant to return, if +any. To do this it calculates a quality value for each variant in each +of the dimensions of variance. It is not necessary to know any of the +details of how negotaion actually takes place in order to use Apache's +content negotation features. However the rest of this document +explains in detail the algorithm used for those interested. <p> + +In some circumstances, Apache can 'fiddle' the quality factor of a +particular dimension to achive a better result. The ways Apache can +fiddle quality factors is explained in more detail below. + +<h3>Dimensions of Negotation</h3> + +<table> +<tr><th>Dimension +<th>Notes +<tr><td>Media Type +<td>Browser indicates preferences on Accept: header. Each item +can have an associate quality factor. Variant description can also +have a quality factor. +<tr><td>Language +<td>Browser indicates preferneces on Accept-Language: header. Each +item +can have a quality factor. Variants can be associated with none, one +or more languages. +<tr><td>Encoding +<td>Browser indicates preference with Accept-Encoding: header. +<tr><td>Charset +<td>Browser indicates preference with Accept-Charset: header. Variant +can indicate a charset as a parameter of the media type. +</table> + +<h3>Apache Negotiation Algorithm</h3> + +Apache uses an algorithm to select the 'best' variant (if any) to +return to the browser. This algorithm is not configurable. It operates +like this: +<p> +<ol> +<li> +Firstly, for each dimension of the negotiation, the appropriate +Accept header is checked and a quality assigned to this each +variant. If the Accept header for any dimension means that this +variant is not acceptable, eliminate it. If no variants remain, go +to step 4. + +<li>Select the 'best' variant by a process of elimination. Each of +the following tests is applied in order. Any variants not selected at +each stage are eliminated. After each test, if only one variant +remains, it is selected as the best match. If more than one variant +remains, move onto the next test. + +<ol> +<li>Multiply the quality factor from the Accept header with the + quality-of-source factor for this variant's media type, and select + the variants with the highest value + +<li>Select the variants with the highest language quality factor + +<li>Select the variants with the best language match, using either the + order of languages on the LanguagePriority directive (if present), + else the order of languages on the Accept-Language header. + +<li>Select the variants with the highest 'level' media parameter + (used to give the version of text/html media types). + +<li>Select only unencoded variants, if there is a mix of encoded + and non-encoded variants. If either all variants are encoded + or all variants are not encoded, select all. + +<li>Select only variants with acceptable charset media parameters, + as given on the Accept-Charset header line. Charset ISO-8859-1 + is always acceptable. Variants not associated with a particular + charset are assumed to be in ISO-8859-1. + +<li>Select the variants with the smallest content length + +<li>Select the first variant of those remaining (this will be either the +first listed in the type-map file, or the first read from the directory) +and go to stage 3. + +</ol> + +<li>The algorithm has now select one 'best' variant, so return + it as the response. The HTTP header Vary is set to indicate the + dimensions of negotations (browsers and caches can use this + information when caching the resource). End. + +<li>To get here means no variant was selected (because non are acceptable + to the browser. Return a 406 status (meaning "No acceptable representation") + with a response body consisting of an HTML document listing the + available variants. Also set the HTTP Vary header to indicate the + dimensions of variance. + +</ol> +<h2><a name="better">Fiddling with Quality Values</a></h2> + +Apache sometimes changes the quality values from what would be +expected by a strict interpretation of the algorithm above. This is to +get a netter result from the algorithm for browsers which do not send +full or accurate information. Some of the most popular browsers send +Accept header information which would otherwise result in the +selection of the wrong variant in many cases. If a browser +sends full and correct information these fiddles will not +be applied. <p> -<B>New as of 0.8:</B> Documents in multiple languages can also be resolved through the use -of the <code>AddLanguage</code> and <code>LanguagePriority</code> -directives: +<h3>Media Types and Wildcards</h3> +The Accept: request header indicates preferneces for media types. It +can also include 'wildcard' media types, such as "image/*" or "*/*" +where the * matches any string. So a request including: <pre> -AddLanguage en .en -AddLanguage fr .fr -AddLanguage de .de -AddLanguage da .da -AddLanguage el .el -AddLanguage it .it - -# LanguagePriority allows you to give precedence to some languages -# in case of a tie during content negotiation. -# Just list the languages in decreasing order of preference. - -LanguagePriority en fr de + Accept: image/*, */* </pre> -Here, a request for "foo.html" matched against "foo.html.en" and -"foo.html.fr" would return an French document to a browser that -indicated a preference for French, or an English document otherwise. -In fact, a request for "foo" matched against "foo.html.en", -"foo.html.fr", "foo.ps.en", "foo.pdf.de", and "foo.txt.it" would do -just what you expect - treat those suffices as a database and compare -the request to it, returning the best match. The languages and data -types share the same suffix name space. +would indicate that any type starting "image/" would be acceptable, +as would any other type (so the first "image/*" is redundant). Some +browsers routinly send wildcards in addition to explicit types they +can handle. For example: +<pre> + Accept: text/html, text/plain, image/gif, image/jpeg, */* +</pre> + +The intention of this result is to indicate that the explicitly +listed types are preferred, but if a different representation is +available, that is ok too. However under the basic algoryth, as given +above, the */* wildcard has exactly equal preference to all the other +types, so they are not being preferred. The browser should really have +sent a request with a lower quality (preference) value for *.*, such +as: +<pre> + Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01 +</pre> +The explicit types have no quality factor, so they default to a +preference of 1.0 (the highest). The wildcard */* is given +a low preference of 0.01, so other types will only be returned if +no variant matches an explicitly listed type. <p> -Note that this machinery only comes into play if the file which the -user attempted to retrieve does <em>not</em> exist by that name; if it -does, it is simply retrieved as usual. (So, someone who actually asks -for <code>foo.jpeg</code>, as opposed to <code>foo</code>, never gets -<code>foo.gif</code>). +If the Accept: header contains <i>no</i> q factors at all, Apache sets +the q value of "*/*", if present, to 0.01 to emulate the desired +behaviour. It also sets the q value of wildcards of the format +"type/*" to 0.02 (so these are preferred over matches against +"*/*". If any media type on the Accept: header contains a q factor, +these special values are <i>not</i> applied, so requests from browsers +which send the correct information to start with work as expected. + +<h3>Variants with no Language</h3> + +If some of the variants for a particular resource have a language +attribute, and some do not, those variants with no language +are given a very low language quality factor of 0.001.<p> + +The reason for setting this language quality factor for +variant with no language to a very low value is to allow +for a default variant which can be supplied if none of the +other variants match the browser's language preferences. + +For example, consider the situation with three variants: + +<ul> +<li>foo.en.html, language en +<li>foo.fr.html, language en +<li>foo.html, no language +</ul> + +The meaning of a variant with no language is that it is +always acceptable to the browser. If the request Accept-Language +header includes either en or fr (or both) one of foo.en.html +or foo.fr.html will be returned. If the browser does not list +either en or fr as acceptable, foo.html will be returned instead. + +<h2>Note on Caching</h2> + +When a cache stores a document, it associates it with the request URL. +The next time that URL is requested, the cache can use the stored +document, provided it is still within date. But if the resource is +subject to content negotiation at the server, this would result in +only the first requested variant being cached, and subsequent cache +hits could return the wrong response. To prevent this, by default +Apache marks all response that are returned after content negotiation +as non-cacheable. Unfortunately, this can increase network traffic by +requiring the resouce to be obtained from the original server evry +time. The HTTP/1.1 protocol includes features to make this much more +efficient, by allowing cacheing. <p> + +For requrests which come from a HTTP/1.0 compliant client (either a +browser or a cache), the directive <tt>CacheNegotiatedDocs</tt> can be +used to allow caching of responses which were subject to negotiation. +This directive can be given in the server config or virtual host, and +takes no arguments. It has no effect on requests from HTTP/1.1 +clients. <!--#include virtual="footer.html" --> -</body> </html> +</BODY> +</HTML> |