summaryrefslogtreecommitdiffstats
path: root/docs/manual/rewrite/rewrite_tech.xml
blob: 902f2afe159a1320c6844f08a7a8c61e299bed8a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
<!-- $LastChangedRevision$ -->

<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<manualpage metafile="rewrite_tech.xml.meta">
<parentdocument href="./index.html"/>

  <title>Apache mod_rewrite Technical Details</title>

<summary>
<p>This document discusses some of the technical details of mod_rewrite
and URL matching.</p>
</summary>
<seealso><a href="../mod/mod_rewrite.html">Module
documentation</a></seealso>
<seealso><a href="rewrite_intro.html">mod_rewrite
introduction</a></seealso>
<seealso><a href="rewrite_guide.html">Rewrite Guide - useful 
examples</a></seealso>
<seealso><a href="rewrite_guide_advanced.html">Advanced Rewrite Guide -
advanced useful examples</a></seealso>

<section id="Internal"><title>Internal Processing</title>

      <p>The internal processing of this module is very complex but
      needs to be explained once even to the average user to avoid
      common mistakes and to let you exploit its full
      functionality.</p>
</section>

<section id="InternalAPI"><title>API Phases</title>

      <p>First you have to understand that when Apache processes a
      HTTP request it does this in phases. A hook for each of these
      phases is provided by the Apache API. Mod_rewrite uses two of
      these hooks: the URL-to-filename translation hook which is
      used after the HTTP request has been read but before any
      authorization starts and the Fixup hook which is triggered
      after the authorization phases and after the per-directory
      config files (<code>.htaccess</code>) have been read, but
      before the content handler is activated.</p>

      <p>So, after a request comes in and Apache has determined the
      corresponding server (or virtual server) the rewriting engine
      starts processing of all mod_rewrite directives from the
      per-server configuration in the URL-to-filename phase. A few
      steps later when the final data directories are found, the
      per-directory configuration directives of mod_rewrite are
      triggered in the Fixup phase. In both situations mod_rewrite
      rewrites URLs either to new URLs or to filenames, although
      there is no obvious distinction between them. This is a usage
      of the API which was not intended to be this way when the API
      was designed, but as of Apache 1.x this is the only way
      mod_rewrite can operate. To make this point more clear
      remember the following two points:</p>

      <ol>
        <li>Although mod_rewrite rewrites URLs to URLs, URLs to
        filenames and even filenames to filenames, the API
        currently provides only a URL-to-filename hook. In Apache
        2.0 the two missing hooks will be added to make the
        processing more clear. But this point has no drawbacks for
        the user, it is just a fact which should be remembered:
        Apache does more in the URL-to-filename hook than the API
        intends for it.</li>

        <li>
          Unbelievably mod_rewrite provides URL manipulations in
          per-directory context, <em>i.e.</em>, within
          <code>.htaccess</code> files, although these are reached
          a very long time after the URLs have been translated to
          filenames. It has to be this way because
          <code>.htaccess</code> files live in the filesystem, so
          processing has already reached this stage. In other
          words: According to the API phases at this time it is too
          late for any URL manipulations. To overcome this chicken
          and egg problem mod_rewrite uses a trick: When you
          manipulate a URL/filename in per-directory context
          mod_rewrite first rewrites the filename back to its
          corresponding URL (which is usually impossible, but see
          the <code>RewriteBase</code> directive below for the
          trick to achieve this) and then initiates a new internal
          sub-request with the new URL. This restarts processing of
          the API phases. 

          <p>Again mod_rewrite tries hard to make this complicated
          step totally transparent to the user, but you should
          remember here: While URL manipulations in per-server
          context are really fast and efficient, per-directory
          rewrites are slow and inefficient due to this chicken and
          egg problem. But on the other hand this is the only way
          mod_rewrite can provide (locally restricted) URL
          manipulations to the average user.</p>
        </li>
      </ol>

      <p>Don't forget these two points!</p>
</section>

<section id="InternalRuleset"><title>Ruleset Processing</title>
 
      <p>Now when mod_rewrite is triggered in these two API phases, it
      reads the configured rulesets from its configuration
      structure (which itself was either created on startup for
      per-server context or during the directory walk of the Apache
      kernel for per-directory context). Then the URL rewriting
      engine is started with the contained ruleset (one or more
      rules together with their conditions). The operation of the
      URL rewriting engine itself is exactly the same for both
      configuration contexts. Only the final result processing is
      different. </p>

      <p>The order of rules in the ruleset is important because the
      rewriting engine processes them in a special (and not very
      obvious) order. The rule is this: The rewriting engine loops
      through the ruleset rule by rule (<directive
      module="mod_rewrite">RewriteRule</directive> directives) and
      when a particular rule matches it optionally loops through
      existing corresponding conditions (<code>RewriteCond</code>
      directives). For historical reasons the conditions are given
      first, and so the control flow is a little bit long-winded. See
      Figure 1 for more details.</p>
<p class="figure">
      <img src="../images/mod_rewrite_fig1.gif" width="428"
           height="385" alt="[Needs graphics capability to display]" /><br />
      <dfn>Figure 1:</dfn>The control flow through the rewriting ruleset
</p>
      <p>As you can see, first the URL is matched against the
      <em>Pattern</em> of each rule. When it fails mod_rewrite
      immediately stops processing this rule and continues with the
      next rule. If the <em>Pattern</em> matches, mod_rewrite looks
      for corresponding rule conditions. If none are present, it
      just substitutes the URL with a new value which is
      constructed from the string <em>Substitution</em> and goes on
      with its rule-looping. But if conditions exist, it starts an
      inner loop for processing them in the order that they are
      listed. For conditions the logic is different: we don't match
      a pattern against the current URL. Instead we first create a
      string <em>TestString</em> by expanding variables,
      back-references, map lookups, <em>etc.</em> and then we try
      to match <em>CondPattern</em> against it. If the pattern
      doesn't match, the complete set of conditions and the
      corresponding rule fails. If the pattern matches, then the
      next condition is processed until no more conditions are
      available. If all conditions match, processing is continued
      with the substitution of the URL with
      <em>Substitution</em>.</p>

</section>


</manualpage>