| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perlreref - Perl Regular Expressions Reference
|
|---|
| 4 |
|
|---|
| 5 | =head1 DESCRIPTION
|
|---|
| 6 |
|
|---|
| 7 | This is a quick reference to Perl's regular expressions.
|
|---|
| 8 | For full information see L<perlre> and L<perlop>, as well
|
|---|
| 9 | as the L</"SEE ALSO"> section in this document.
|
|---|
| 10 |
|
|---|
| 11 | =head2 OPERATORS
|
|---|
| 12 |
|
|---|
| 13 | =~ determines to which variable the regex is applied.
|
|---|
| 14 | In its absence, $_ is used.
|
|---|
| 15 |
|
|---|
| 16 | $var =~ /foo/;
|
|---|
| 17 |
|
|---|
| 18 | !~ determines to which variable the regex is applied,
|
|---|
| 19 | and negates the result of the match; it returns
|
|---|
| 20 | false if the match succeeds, and true if it fails.
|
|---|
| 21 |
|
|---|
| 22 | $var !~ /foo/;
|
|---|
| 23 |
|
|---|
| 24 | m/pattern/igmsoxc searches a string for a pattern match,
|
|---|
| 25 | applying the given options.
|
|---|
| 26 |
|
|---|
| 27 | i case-Insensitive
|
|---|
| 28 | g Global - all occurrences
|
|---|
| 29 | m Multiline mode - ^ and $ match internal lines
|
|---|
| 30 | s match as a Single line - . matches \n
|
|---|
| 31 | o compile pattern Once
|
|---|
| 32 | x eXtended legibility - free whitespace and comments
|
|---|
| 33 | c don't reset pos on failed matches when using /g
|
|---|
| 34 |
|
|---|
| 35 | If 'pattern' is an empty string, the last I<successfully> matched
|
|---|
| 36 | regex is used. Delimiters other than '/' may be used for both this
|
|---|
| 37 | operator and the following ones.
|
|---|
| 38 |
|
|---|
| 39 | qr/pattern/imsox lets you store a regex in a variable,
|
|---|
| 40 | or pass one around. Modifiers as for m// and are stored
|
|---|
| 41 | within the regex.
|
|---|
| 42 |
|
|---|
| 43 | s/pattern/replacement/igmsoxe substitutes matches of
|
|---|
| 44 | 'pattern' with 'replacement'. Modifiers as for m//
|
|---|
| 45 | with one addition:
|
|---|
| 46 |
|
|---|
| 47 | e Evaluate replacement as an expression
|
|---|
| 48 |
|
|---|
| 49 | 'e' may be specified multiple times. 'replacement' is interpreted
|
|---|
| 50 | as a double quoted string unless a single-quote (') is the delimiter.
|
|---|
| 51 |
|
|---|
| 52 | ?pattern? is like m/pattern/ but matches only once. No alternate
|
|---|
| 53 | delimiters can be used. Must be reset with L<reset|perlfunc/reset>.
|
|---|
| 54 |
|
|---|
| 55 | =head2 SYNTAX
|
|---|
| 56 |
|
|---|
| 57 | \ Escapes the character immediately following it
|
|---|
| 58 | . Matches any single character except a newline (unless /s is used)
|
|---|
| 59 | ^ Matches at the beginning of the string (or line, if /m is used)
|
|---|
| 60 | $ Matches at the end of the string (or line, if /m is used)
|
|---|
| 61 | * Matches the preceding element 0 or more times
|
|---|
| 62 | + Matches the preceding element 1 or more times
|
|---|
| 63 | ? Matches the preceding element 0 or 1 times
|
|---|
| 64 | {...} Specifies a range of occurrences for the element preceding it
|
|---|
| 65 | [...] Matches any one of the characters contained within the brackets
|
|---|
| 66 | (...) Groups subexpressions for capturing to $1, $2...
|
|---|
| 67 | (?:...) Groups subexpressions without capturing (cluster)
|
|---|
| 68 | | Matches either the subexpression preceding or following it
|
|---|
| 69 | \1, \2 ... The text from the Nth group
|
|---|
| 70 |
|
|---|
| 71 | =head2 ESCAPE SEQUENCES
|
|---|
| 72 |
|
|---|
| 73 | These work as in normal strings.
|
|---|
| 74 |
|
|---|
| 75 | \a Alarm (beep)
|
|---|
| 76 | \e Escape
|
|---|
| 77 | \f Formfeed
|
|---|
| 78 | \n Newline
|
|---|
| 79 | \r Carriage return
|
|---|
| 80 | \t Tab
|
|---|
| 81 | \037 Any octal ASCII value
|
|---|
| 82 | \x7f Any hexadecimal ASCII value
|
|---|
| 83 | \x{263a} A wide hexadecimal value
|
|---|
| 84 | \cx Control-x
|
|---|
| 85 | \N{name} A named character
|
|---|
| 86 |
|
|---|
| 87 | \l Lowercase next character
|
|---|
| 88 | \u Titlecase next character
|
|---|
| 89 | \L Lowercase until \E
|
|---|
| 90 | \U Uppercase until \E
|
|---|
| 91 | \Q Disable pattern metacharacters until \E
|
|---|
| 92 | \E End case modification
|
|---|
| 93 |
|
|---|
| 94 | For Titlecase, see L</Titlecase>.
|
|---|
| 95 |
|
|---|
| 96 | This one works differently from normal strings:
|
|---|
| 97 |
|
|---|
| 98 | \b An assertion, not backspace, except in a character class
|
|---|
| 99 |
|
|---|
| 100 | =head2 CHARACTER CLASSES
|
|---|
| 101 |
|
|---|
| 102 | [amy] Match 'a', 'm' or 'y'
|
|---|
| 103 | [f-j] Dash specifies "range"
|
|---|
| 104 | [f-j-] Dash escaped or at start or end means 'dash'
|
|---|
| 105 | [^f-j] Caret indicates "match any character _except_ these"
|
|---|
| 106 |
|
|---|
| 107 | The following sequences work within or without a character class.
|
|---|
| 108 | The first six are locale aware, all are Unicode aware. The default
|
|---|
| 109 | character class equivalent are given. See L<perllocale> and
|
|---|
| 110 | L<perlunicode> for details.
|
|---|
| 111 |
|
|---|
| 112 | \d A digit [0-9]
|
|---|
| 113 | \D A nondigit [^0-9]
|
|---|
| 114 | \w A word character [a-zA-Z0-9_]
|
|---|
| 115 | \W A non-word character [^a-zA-Z0-9_]
|
|---|
| 116 | \s A whitespace character [ \t\n\r\f]
|
|---|
| 117 | \S A non-whitespace character [^ \t\n\r\f]
|
|---|
| 118 |
|
|---|
| 119 | \C Match a byte (with Unicode, '.' matches a character)
|
|---|
| 120 | \pP Match P-named (Unicode) property
|
|---|
| 121 | \p{...} Match Unicode property with long name
|
|---|
| 122 | \PP Match non-P
|
|---|
| 123 | \P{...} Match lack of Unicode property with long name
|
|---|
| 124 | \X Match extended unicode sequence
|
|---|
| 125 |
|
|---|
| 126 | POSIX character classes and their Unicode and Perl equivalents:
|
|---|
| 127 |
|
|---|
| 128 | alnum IsAlnum Alphanumeric
|
|---|
| 129 | alpha IsAlpha Alphabetic
|
|---|
| 130 | ascii IsASCII Any ASCII char
|
|---|
| 131 | blank IsSpace [ \t] Horizontal whitespace (GNU extension)
|
|---|
| 132 | cntrl IsCntrl Control characters
|
|---|
| 133 | digit IsDigit \d Digits
|
|---|
| 134 | graph IsGraph Alphanumeric and punctuation
|
|---|
| 135 | lower IsLower Lowercase chars (locale and Unicode aware)
|
|---|
| 136 | print IsPrint Alphanumeric, punct, and space
|
|---|
| 137 | punct IsPunct Punctuation
|
|---|
| 138 | space IsSpace [\s\ck] Whitespace
|
|---|
| 139 | IsSpacePerl \s Perl's whitespace definition
|
|---|
| 140 | upper IsUpper Uppercase chars (locale and Unicode aware)
|
|---|
| 141 | word IsWord \w Alphanumeric plus _ (Perl extension)
|
|---|
| 142 | xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit
|
|---|
| 143 |
|
|---|
| 144 | Within a character class:
|
|---|
| 145 |
|
|---|
| 146 | POSIX traditional Unicode
|
|---|
| 147 | [:digit:] \d \p{IsDigit}
|
|---|
| 148 | [:^digit:] \D \P{IsDigit}
|
|---|
| 149 |
|
|---|
| 150 | =head2 ANCHORS
|
|---|
| 151 |
|
|---|
| 152 | All are zero-width assertions.
|
|---|
| 153 |
|
|---|
| 154 | ^ Match string start (or line, if /m is used)
|
|---|
| 155 | $ Match string end (or line, if /m is used) or before newline
|
|---|
| 156 | \b Match word boundary (between \w and \W)
|
|---|
| 157 | \B Match except at word boundary (between \w and \w or \W and \W)
|
|---|
| 158 | \A Match string start (regardless of /m)
|
|---|
| 159 | \Z Match string end (before optional newline)
|
|---|
| 160 | \z Match absolute string end
|
|---|
| 161 | \G Match where previous m//g left off
|
|---|
| 162 |
|
|---|
| 163 | =head2 QUANTIFIERS
|
|---|
| 164 |
|
|---|
| 165 | Quantifiers are greedy by default -- match the B<longest> leftmost.
|
|---|
| 166 |
|
|---|
| 167 | Maximal Minimal Allowed range
|
|---|
| 168 | ------- ------- -------------
|
|---|
| 169 | {n,m} {n,m}? Must occur at least n times but no more than m times
|
|---|
| 170 | {n,} {n,}? Must occur at least n times
|
|---|
| 171 | {n} {n}? Must occur exactly n times
|
|---|
| 172 | * *? 0 or more times (same as {0,})
|
|---|
| 173 | + +? 1 or more times (same as {1,})
|
|---|
| 174 | ? ?? 0 or 1 time (same as {0,1})
|
|---|
| 175 |
|
|---|
| 176 | There is no quantifier {,n} -- that gets understood as a literal string.
|
|---|
| 177 |
|
|---|
| 178 | =head2 EXTENDED CONSTRUCTS
|
|---|
| 179 |
|
|---|
| 180 | (?#text) A comment
|
|---|
| 181 | (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
|
|---|
| 182 | (?=...) Zero-width positive lookahead assertion
|
|---|
| 183 | (?!...) Zero-width negative lookahead assertion
|
|---|
| 184 | (?<=...) Zero-width positive lookbehind assertion
|
|---|
| 185 | (?<!...) Zero-width negative lookbehind assertion
|
|---|
| 186 | (?>...) Grab what we can, prohibit backtracking
|
|---|
| 187 | (?{ code }) Embedded code, return value becomes $^R
|
|---|
| 188 | (??{ code }) Dynamic regex, return value used as regex
|
|---|
| 189 | (?(cond)yes|no) cond being integer corresponding to capturing parens
|
|---|
| 190 | (?(cond)yes) or a lookaround/eval zero-width assertion
|
|---|
| 191 |
|
|---|
| 192 | =head2 VARIABLES
|
|---|
| 193 |
|
|---|
| 194 | $_ Default variable for operators to use
|
|---|
| 195 | $* Enable multiline matching (deprecated; not in 5.9.0 or later)
|
|---|
| 196 |
|
|---|
| 197 | $& Entire matched string
|
|---|
| 198 | $` Everything prior to matched string
|
|---|
| 199 | $' Everything after to matched string
|
|---|
| 200 |
|
|---|
| 201 | The use of those last three will slow down B<all> regex use
|
|---|
| 202 | within your program. Consult L<perlvar> for C<@LAST_MATCH_START>
|
|---|
| 203 | to see equivalent expressions that won't cause slow down.
|
|---|
| 204 | See also L<Devel::SawAmpersand>.
|
|---|
| 205 |
|
|---|
| 206 | $1, $2 ... hold the Xth captured expr
|
|---|
| 207 | $+ Last parenthesized pattern match
|
|---|
| 208 | $^N Holds the most recently closed capture
|
|---|
| 209 | $^R Holds the result of the last (?{...}) expr
|
|---|
| 210 | @- Offsets of starts of groups. $-[0] holds start of whole match
|
|---|
| 211 | @+ Offsets of ends of groups. $+[0] holds end of whole match
|
|---|
| 212 |
|
|---|
| 213 | Captured groups are numbered according to their I<opening> paren.
|
|---|
| 214 |
|
|---|
| 215 | =head2 FUNCTIONS
|
|---|
| 216 |
|
|---|
| 217 | lc Lowercase a string
|
|---|
| 218 | lcfirst Lowercase first char of a string
|
|---|
| 219 | uc Uppercase a string
|
|---|
| 220 | ucfirst Titlecase first char of a string
|
|---|
| 221 |
|
|---|
| 222 | pos Return or set current match position
|
|---|
| 223 | quotemeta Quote metacharacters
|
|---|
| 224 | reset Reset ?pattern? status
|
|---|
| 225 | study Analyze string for optimizing matching
|
|---|
| 226 |
|
|---|
| 227 | split Use regex to split a string into parts
|
|---|
| 228 |
|
|---|
| 229 | The first four of these are like the escape sequences C<\L>, C<\l>,
|
|---|
| 230 | C<\U>, and C<\u>. For Titlecase, see L</Titlecase>.
|
|---|
| 231 |
|
|---|
| 232 | =head2 TERMINOLOGY
|
|---|
| 233 |
|
|---|
| 234 | =head3 Titlecase
|
|---|
| 235 |
|
|---|
| 236 | Unicode concept which most often is equal to uppercase, but for
|
|---|
| 237 | certain characters like the German "sharp s" there is a difference.
|
|---|
| 238 |
|
|---|
| 239 | =head1 AUTHOR
|
|---|
| 240 |
|
|---|
| 241 | Iain Truskett.
|
|---|
| 242 |
|
|---|
| 243 | This document may be distributed under the same terms as Perl itself.
|
|---|
| 244 |
|
|---|
| 245 | =head1 SEE ALSO
|
|---|
| 246 |
|
|---|
| 247 | =over 4
|
|---|
| 248 |
|
|---|
| 249 | =item *
|
|---|
| 250 |
|
|---|
| 251 | L<perlretut> for a tutorial on regular expressions.
|
|---|
| 252 |
|
|---|
| 253 | =item *
|
|---|
| 254 |
|
|---|
| 255 | L<perlrequick> for a rapid tutorial.
|
|---|
| 256 |
|
|---|
| 257 | =item *
|
|---|
| 258 |
|
|---|
| 259 | L<perlre> for more details.
|
|---|
| 260 |
|
|---|
| 261 | =item *
|
|---|
| 262 |
|
|---|
| 263 | L<perlvar> for details on the variables.
|
|---|
| 264 |
|
|---|
| 265 | =item *
|
|---|
| 266 |
|
|---|
| 267 | L<perlop> for details on the operators.
|
|---|
| 268 |
|
|---|
| 269 | =item *
|
|---|
| 270 |
|
|---|
| 271 | L<perlfunc> for details on the functions.
|
|---|
| 272 |
|
|---|
| 273 | =item *
|
|---|
| 274 |
|
|---|
| 275 | L<perlfaq6> for FAQs on regular expressions.
|
|---|
| 276 |
|
|---|
| 277 | =item *
|
|---|
| 278 |
|
|---|
| 279 | The L<re> module to alter behaviour and aid
|
|---|
| 280 | debugging.
|
|---|
| 281 |
|
|---|
| 282 | =item *
|
|---|
| 283 |
|
|---|
| 284 | L<perldebug/"Debugging regular expressions">
|
|---|
| 285 |
|
|---|
| 286 | =item *
|
|---|
| 287 |
|
|---|
| 288 | L<perluniintro>, L<perlunicode>, L<charnames> and L<locale>
|
|---|
| 289 | for details on regexes and internationalisation.
|
|---|
| 290 |
|
|---|
| 291 | =item *
|
|---|
| 292 |
|
|---|
| 293 | I<Mastering Regular Expressions> by Jeffrey Friedl
|
|---|
| 294 | (F<http://regex.info/>) for a thorough grounding and
|
|---|
| 295 | reference on the topic.
|
|---|
| 296 |
|
|---|
| 297 | =back
|
|---|
| 298 |
|
|---|
| 299 | =head1 THANKS
|
|---|
| 300 |
|
|---|
| 301 | David P.C. Wollmann,
|
|---|
| 302 | Richard Soderberg,
|
|---|
| 303 | Sean M. Burke,
|
|---|
| 304 | Tom Christiansen,
|
|---|
| 305 | Jim Cromie,
|
|---|
| 306 | and
|
|---|
| 307 | Jeffrey Goff
|
|---|
| 308 | for useful advice.
|
|---|
| 309 |
|
|---|
| 310 | =cut
|
|---|