From 5b450d9ac1df19721987d4e08123120b42b52831 Mon Sep 17 00:00:00 2001 From: pcs Date: Fri, 20 Dec 1996 16:13:14 +0000 Subject: Expand documentation of content negotiation for Apache 1.2 including HTTP/1.1 stuff. Document the algorithm apache uses to choose a variant. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@77295 13f79535-47bb-0310-9956-ffa450edef68 --- docs/manual/content-negotiation.html | 403 ++++++++++++++++++++++++++--------- 1 file changed, 305 insertions(+), 98 deletions(-) (limited to 'docs/manual/content-negotiation.html') diff --git a/docs/manual/content-negotiation.html b/docs/manual/content-negotiation.html index 4947e3a6a6..dd3962b797 100644 --- a/docs/manual/content-negotiation.html +++ b/docs/manual/content-negotiation.html @@ -1,57 +1,96 @@ - - -Apache server Content arbitration: MultiViews and *.var files - + + + +Apache Content Negotiation + - + -

Content Arbitration: MultiViews and *.var files

- -The HTTP standard allows clients (i.e., browsers like Mosaic or -Netscape) to specify what data formats they are prepared to accept. -The intention is that when information is available in multiple -variants (e.g., in different data formats), servers can use this -information to decide which variant to send. This feature has been -supported in the CERN server for a while, and while it is not yet -supported in the NCSA server, it is likely to assume a new importance -in light of the emergence of HTML3 capable browsers.

- -The Apache module mod_negotiation handles -content negotiation in two different ways; special treatment for the -pseudo-mime-type application/x-type-map, and the -MultiViews per-directory Option (which can be set in srm.conf, or in -.htaccess files, as usual). These features are alternate user -interfaces to what amounts to the same piece of code (in the new file -http_mime_db.c) which implements the content negotiation -portion of the HTTP protocol.

- -Each of these features allows one of several files to satisfy a -request, based on what the client says it's willing to accept; the -differences are in the way the files are identified: +

Content Negotiation

+ +Apache's support for content negotiation has been updated to meet the +HTTP/1.1 specification. It can choose the best representation of a +resource based on the browser-supplied preferences for media type, +languages, character set and encoding. It is also implements a +couple of features to give more intelligent handling of requests from +browsers which send incomplete negotiation information.

+ +Content negotiation is provided by the +mod_negotiation module, +which is compiled in by default. + +


+ +

About Content Negotiation

+ +A resource may be available in several different representations. For +example, it might be available in different languages or different +media types, or a combination. One way of selecting the most +appropriate choice is to give the user an index page, and let them +select. However it is often possible for the server to choose +automatically. This works because browsers can send as part of each +request information about what representations they prefer. For +example, a browser could indicate that it would like to see +information in French, if possible, else English will do. Browsers +indicate their preferences by headers in the request. To request only +French representations, the browser would send + +
+  Accept-Language: fr
+
+ +Note that this preference will only be applied when there is a choice +of representations and they vary by language. +

+ +As an example of a more complex request, this browser has been +configured to accept French and English, but prefer French, and to +accept various media types, preferring HTML over plain text or other +text types, and prefering GIF or jpeg over other media types, but also +allowing any other media type as a last resort: + +

+  Accept-Language: fr; q=1.0, en; q=0.5
+  Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6,
+        image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1
+
+ +Apache 1.2 supports 'server driven' content negotiation, as defined in +the HTTP/1.1 specification. It fully supports the Accept, +Accept-Language, Accept-Charset and Accept-Encoding request headers. +

+ +The terms used in content negotiation are: a resource is an +item which can be requested of a server, which might be selected as +the result of a content negotiation algorithm. If a resource is +available in several formats, these are called representations +or variants. The ways in which the variants for a particular +resource vary are called the dimensions of negotiation. + +

Negotiation in Apache

+ +In order to negotiate a resource, the server needs to be given +information about each of the variants. This is done in one of two +ways: -Apache also supports a new pseudo-MIME type, -text/x-server-parsed-html3, which is treated as text/html;level=3 -for purposes of content negotiation, and as server-side-included HTML -elsewhere. +

Using a type-map file

-

Type maps (*.var files)

- -A type map is a document which is typed by the server (using its -normal suffix-based mechanisms) as -application/x-type-map. Note that to use this feature, -you've got to have an AddType some place which defines a -file suffix as application/x-type-map; the easiest thing -may be to stick a +A type map is a document which is associated with the handler +named type-map (or, for backwards-compatibility with +older Apache configurations, the mime type +application/x-type-map). Note that to use this feature, +you've got to have an SetHanlder some place which defines a +file suffix as type-map; this is best done with a
 
-  AddType application/x-type-map var
+  AddHandler type-map var
 
 
in srm.conf. See comments in the sample config files for @@ -61,25 +100,27 @@ Type map files have an entry for each available variant; these entries consist of contiguous RFC822-format header lines. Entries for different variants are separated by blank lines. Blank lines are illegal within an entry. It is conventional to begin a map file with -an entry for the combined entity as a whole, e.g., +an entry for the combined entity as a whole (although this +is not required, and if present will be ignored). An example +map file is:
 
-  URI: foo; vary="type,language"
+  URI: foo
 
   URI: foo.en.html
-  Content-type: text/html; level=2
+  Content-type: text/html
   Content-language: en
 
-  URI: foo.fr.html
-  Content-type: text/html; level=2
-  Content-language: fr
-
+  URI: foo.fr.de.html
+  Content-type: text/html; charset=iso-8859-2
+  Content-language: fr, de
 
-If the variants have different qualities, that may be indicated by the -"qs" parameter, as in this picture (available as jpeg, gif, or ASCII-art): -
 
-  URI: foo; vary="type,language"
+If the variants have different source qualities, that may be indicated
+by the "qs" parameter to the media type, as in this picture (available
+as jpeg, gif, or ASCII-art):
+
+  URI: foo
 
   URI: foo.jpeg
   Content-type: image/jpeg; qs=0.8
@@ -90,7 +131,12 @@ If the variants have different qualities, that may be indicated by the
   URI: foo.txt
   Content-type: text/plain; qs=0.01
 
-

+

+

+ +qs values can vary between 0.000 and 1.000. Note that any variant with +a qs value of 0.000 will never be chosen. Variants with no 'qs' +parameter value are given a qs factor of 1.0.

The full list of headers recognized is: @@ -103,12 +149,12 @@ The full list of headers recognized is: client would be granted access if they were to be requested directly.

Content-type: -
media type --- level may be specified, along with "qs". These +
media type --- charset, level and "qs" parameters may be given. These are often referred to as MIME types; typical media types are image/gif, text/plain, or text/html; level=3.
Content-language: -
The language of the variant, specified as an Internet standard +
The languages of the variant, specified as an internet standard language code (e.g., en for English, kr for Korean, etc.).
Content-encoding: @@ -139,7 +185,7 @@ have to ask for it by name. (Fixing this is a one-line change to The effect of MultiViews is as follows: if the server receives a request for /some/dir/foo, if /some/dir has MultiViews enabled, and -/some/dir/foo does *not* exist, then the server reads the +/some/dir/foo does not exist, then the server reads the directory looking for files named foo.*, and effectively fakes up a type map which names all those files, assigning them the same media types and content-encodings it would have if the client had asked for @@ -161,53 +207,214 @@ present, and index.cgi is there, the server will run it.

-If one of the files found by the globbing is a CGI script, it's not -obvious what should happen. My code gives that case gets special -treatment --- if the request was a POST, or a GET with QUERY_ARGS or -PATH_INFO, the script is given an extremely high quality rating, and -generally invoked; otherwise it is given an extremely low quality -rating, which generally causes one of the other views (if any) to be -retrieved. This is the only jiggering of quality ratings done by the -MultiViews code; aside from that, all Qualities in the synthesized -type maps are 1.0. +If one of the files found when reading the directive is a CGI script, +it's not obvious what should happen. The code gives that case +special treatment --- if the request was a POST, or a GET with +QUERY_ARGS or PATH_INFO, the script is given an extremely high quality +rating, and generally invoked; otherwise it is given an extremely low +quality rating, which generally causes one of the other views (if any) +to be retrieved. + +

The Negotiation Algorithm

+ +After Apache has obtained a list of the variants for a given resource, +either from a type-map file or from the filenames in the directory, it +applies a algorithm to decide on the 'best' variant to return, if +any. To do this it calculates a quality value for each variant in each +of the dimensions of variance. It is not necessary to know any of the +details of how negotaion actually takes place in order to use Apache's +content negotation features. However the rest of this document +explains in detail the algorithm used for those interested.

+ +In some circumstances, Apache can 'fiddle' the quality factor of a +particular dimension to achive a better result. The ways Apache can +fiddle quality factors is explained in more detail below. + +

Dimensions of Negotation

+ + +
Dimension +Notes +
Media Type +Browser indicates preferences on Accept: header. Each item +can have an associate quality factor. Variant description can also +have a quality factor. +
Language +Browser indicates preferneces on Accept-Language: header. Each +item +can have a quality factor. Variants can be associated with none, one +or more languages. +
Encoding +Browser indicates preference with Accept-Encoding: header. +
Charset +Browser indicates preference with Accept-Charset: header. Variant +can indicate a charset as a parameter of the media type. +
+ +

Apache Negotiation Algorithm

+ +Apache uses an algorithm to select the 'best' variant (if any) to +return to the browser. This algorithm is not configurable. It operates +like this: +

+

    +
  1. +Firstly, for each dimension of the negotiation, the appropriate +Accept header is checked and a quality assigned to this each +variant. If the Accept header for any dimension means that this +variant is not acceptable, eliminate it. If no variants remain, go +to step 4. + +
  2. Select the 'best' variant by a process of elimination. Each of +the following tests is applied in order. Any variants not selected at +each stage are eliminated. After each test, if only one variant +remains, it is selected as the best match. If more than one variant +remains, move onto the next test. + +
      +
    1. Multiply the quality factor from the Accept header with the + quality-of-source factor for this variant's media type, and select + the variants with the highest value + +
    2. Select the variants with the highest language quality factor + +
    3. Select the variants with the best language match, using either the + order of languages on the LanguagePriority directive (if present), + else the order of languages on the Accept-Language header. + +
    4. Select the variants with the highest 'level' media parameter + (used to give the version of text/html media types). + +
    5. Select only unencoded variants, if there is a mix of encoded + and non-encoded variants. If either all variants are encoded + or all variants are not encoded, select all. + +
    6. Select only variants with acceptable charset media parameters, + as given on the Accept-Charset header line. Charset ISO-8859-1 + is always acceptable. Variants not associated with a particular + charset are assumed to be in ISO-8859-1. + +
    7. Select the variants with the smallest content length + +
    8. Select the first variant of those remaining (this will be either the +first listed in the type-map file, or the first read from the directory) +and go to stage 3. + +
    + +
  3. The algorithm has now select one 'best' variant, so return + it as the response. The HTTP header Vary is set to indicate the + dimensions of negotations (browsers and caches can use this + information when caching the resource). End. + +
  4. To get here means no variant was selected (because non are acceptable + to the browser. Return a 406 status (meaning "No acceptable representation") + with a response body consisting of an HTML document listing the + available variants. Also set the HTTP Vary header to indicate the + dimensions of variance. + +
+

Fiddling with Quality Values

+ +Apache sometimes changes the quality values from what would be +expected by a strict interpretation of the algorithm above. This is to +get a netter result from the algorithm for browsers which do not send +full or accurate information. Some of the most popular browsers send +Accept header information which would otherwise result in the +selection of the wrong variant in many cases. If a browser +sends full and correct information these fiddles will not +be applied.

-New as of 0.8: Documents in multiple languages can also be resolved through the use -of the AddLanguage and LanguagePriority -directives: +

Media Types and Wildcards

+The Accept: request header indicates preferneces for media types. It +can also include 'wildcard' media types, such as "image/*" or "*/*" +where the * matches any string. So a request including:
-AddLanguage en .en
-AddLanguage fr .fr
-AddLanguage de .de
-AddLanguage da .da
-AddLanguage el .el
-AddLanguage it .it
-
-# LanguagePriority allows you to give precedence to some languages
-# in case of a tie during content negotiation.
-# Just list the languages in decreasing order of preference.
-
-LanguagePriority en fr de
+  Accept: image/*, */*
 
-Here, a request for "foo.html" matched against "foo.html.en" and -"foo.html.fr" would return an French document to a browser that -indicated a preference for French, or an English document otherwise. -In fact, a request for "foo" matched against "foo.html.en", -"foo.html.fr", "foo.ps.en", "foo.pdf.de", and "foo.txt.it" would do -just what you expect - treat those suffices as a database and compare -the request to it, returning the best match. The languages and data -types share the same suffix name space. +would indicate that any type starting "image/" would be acceptable, +as would any other type (so the first "image/*" is redundant). Some +browsers routinly send wildcards in addition to explicit types they +can handle. For example: +
+  Accept: text/html, text/plain, image/gif, image/jpeg, */*
+
+ +The intention of this result is to indicate that the explicitly +listed types are preferred, but if a different representation is +available, that is ok too. However under the basic algoryth, as given +above, the */* wildcard has exactly equal preference to all the other +types, so they are not being preferred. The browser should really have +sent a request with a lower quality (preference) value for *.*, such +as: +
+  Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01
+
+The explicit types have no quality factor, so they default to a +preference of 1.0 (the highest). The wildcard */* is given +a low preference of 0.01, so other types will only be returned if +no variant matches an explicitly listed type.

-Note that this machinery only comes into play if the file which the -user attempted to retrieve does not exist by that name; if it -does, it is simply retrieved as usual. (So, someone who actually asks -for foo.jpeg, as opposed to foo, never gets -foo.gif). +If the Accept: header contains no q factors at all, Apache sets +the q value of "*/*", if present, to 0.01 to emulate the desired +behaviour. It also sets the q value of wildcards of the format +"type/*" to 0.02 (so these are preferred over matches against +"*/*". If any media type on the Accept: header contains a q factor, +these special values are not applied, so requests from browsers +which send the correct information to start with work as expected. + +

Variants with no Language

+ +If some of the variants for a particular resource have a language +attribute, and some do not, those variants with no language +are given a very low language quality factor of 0.001.

+ +The reason for setting this language quality factor for +variant with no language to a very low value is to allow +for a default variant which can be supplied if none of the +other variants match the browser's language preferences. + +For example, consider the situation with three variants: + +

+ +The meaning of a variant with no language is that it is +always acceptable to the browser. If the request Accept-Language +header includes either en or fr (or both) one of foo.en.html +or foo.fr.html will be returned. If the browser does not list +either en or fr as acceptable, foo.html will be returned instead. + +

Note on Caching

+ +When a cache stores a document, it associates it with the request URL. +The next time that URL is requested, the cache can use the stored +document, provided it is still within date. But if the resource is +subject to content negotiation at the server, this would result in +only the first requested variant being cached, and subsequent cache +hits could return the wrong response. To prevent this, by default +Apache marks all response that are returned after content negotiation +as non-cacheable. Unfortunately, this can increase network traffic by +requiring the resouce to be obtained from the original server evry +time. The HTTP/1.1 protocol includes features to make this much more +efficient, by allowing cacheing.

+ +For requrests which come from a HTTP/1.0 compliant client (either a +browser or a cache), the directive CacheNegotiatedDocs can be +used to allow caching of responses which were subject to negotiation. +This directive can be given in the server config or virtual host, and +takes no arguments. It has no effect on requests from HTTP/1.1 +clients. - + + -- cgit v1.2.3