summaryrefslogtreecommitdiffstats
path: root/srclib/pcre/doc/pcreposix.txt
blob: 2d76f7cdcc3c35a57bde237b74f33b4581ed599a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
NAME
     pcreposix - POSIX API for  Perl-compatible  regular  expres-
     sions.



SYNOPSIS
     #include <pcreposix.h>

     int regcomp(regex_t *preg, const char *pattern,
          int cflags);

     int regexec(regex_t *preg, const char *string,
          size_t nmatch, regmatch_t pmatch[], int eflags);

     size_t regerror(int errcode, const regex_t *preg,
          char *errbuf, size_t errbuf_size);

     void regfree(regex_t *preg);



DESCRIPTION
     This set of functions provides a POSIX-style API to the PCRE
     regular expression package. See the pcre documentation for a
     description of the native  API,  which  contains  additional
     functionality.

     The functions described here are just wrapper functions that
     ultimately call the native API. Their prototypes are defined
     in the pcreposix.h header file,  and  on  Unix  systems  the
     library  itself is called pcreposix.a, so can be accessed by
     adding -lpcreposix to the command for linking an application
     which uses them. Because the POSIX functions call the native
     ones, it is also necessary to add -lpcre.

     I have implemented only those option bits that can  be  rea-
     sonably  mapped  to  PCRE  native  options. In addition, the
     options REG_EXTENDED and  REG_NOSUB  are  defined  with  the
     value zero. They have no effect, but since programs that are
     written to the POSIX interface often use them, this makes it
     easier to slot in PCRE as a replacement library. Other POSIX
     options are not even defined.

     When PCRE is called via these functions, it is only the  API
     that is POSIX-like in style. The syntax and semantics of the
     regular expressions themselves are still those of Perl, sub-
     ject  to  the  setting of various PCRE options, as described
     below.

     The header for these functions is supplied as pcreposix.h to
     avoid  any  potential  clash  with other POSIX libraries. It
     can, of course, be renamed or aliased as regex.h,  which  is
     the "correct" name. It provides two structure types, regex_t
     for compiled internal forms, and  regmatch_t  for  returning
     captured  substrings.  It  also defines some constants whose
     names start with "REG_"; these are used for setting  options
     and identifying error codes.



COMPILING A PATTERN
     The function regcomp() is called to compile a  pattern  into
     an  internal form. The pattern is a C string terminated by a
     binary zero, and is passed in the argument pattern. The preg
     argument  is  a pointer to a regex_t structure which is used
     as a base for storing information about the compiled expres-
     sion.

     The argument cflags is either zero, or contains one or  more
     of the bits defined by the following macros:

       REG_ICASE

     The PCRE_CASELESS option  is  set  when  the  expression  is
     passed for compilation to the native function.

       REG_NEWLINE

     The PCRE_MULTILINE option is  set  when  the  expression  is
     passed for compilation to the native function.

     In the absence of these flags, no options are passed to  the
     native  function.  This means the the regex is compiled with
     PCRE default semantics. In particular, the  way  it  handles
     newline  characters  in  the subject string is the Perl way,
     not the POSIX way. Note that setting PCRE_MULTILINE has only
     some  of  the effects specified for REG_NEWLINE. It does not
     affect the way newlines are matched by . (they aren't) or  a
     negative class such as [^a] (they are).

     The yield of regcomp() is zero on success, and non-zero oth-
     erwise.  The preg structure is filled in on success, and one
     member of the structure is publicized: re_nsub contains  the
     number  of  capturing subpatterns in the regular expression.
     Various error codes are defined in the header file.



MATCHING A PATTERN
     The function regexec() is called  to  match  a  pre-compiled
     pattern  preg against a given string, which is terminated by
     a zero byte, subject to the options in eflags. These can be:

       REG_NOTBOL

     The PCRE_NOTBOL option is set when  calling  the  underlying
     PCRE matching function.

       REG_NOTEOL

     The PCRE_NOTEOL option is set when  calling  the  underlying
     PCRE matching function.

     The portion of the string that was  matched,  and  also  any
     captured  substrings,  are returned via the pmatch argument,
     which points to  an  array  of  nmatch  structures  of  type
     regmatch_t,  containing  the  members rm_so and rm_eo. These
     contain the offset to the first character of each  substring
     and  the offset to the first character after the end of each
     substring, respectively.  The  0th  element  of  the  vector
     relates  to  the  entire portion of string that was matched;
     subsequent elements relate to the capturing  subpatterns  of
     the  regular  expression.  Unused  entries in the array have
     both structure members set to -1.

     A successful match yields a zero return; various error codes
     are  defined in the header file, of which REG_NOMATCH is the
     "expected" failure code.



ERROR MESSAGES
     The regerror()  function  maps  a  non-zero  errorcode  from
     either regcomp or regexec to a printable message. If preg is
     not NULL, the error should have arisen from the use of  that
     structure.  A  message terminated by a binary zero is placed
     in errbuf. The length of the message, including the zero, is
     limited  to  errbuf_size.  The  yield of the function is the
     size of buffer needed to hold the whole message.



STORAGE
     Compiling a regular expression causes memory to be allocated
     and  associated  with  the preg structure. The function reg-
     free() frees all such memory, after which preg may no longer
     be used as a compiled expression.



AUTHOR
     Philip Hazel <ph10@cam.ac.uk>
     University Computing Service,
     New Museums Site,
     Cambridge CB2 3QG, England.
     Phone: +44 1223 334714

     Copyright (c) 1997-2000 University of Cambridge.