StringRegExp Hilfe

AutoIt 3 RegExpression Remarks

Regular expression notation is a compact way of specifying a pattern for strings that can be searched. Regular expressions are character strings in which plain text characters indicate what text should exist in the target string, and a some characters are given special meanings to indicate what variability is allowed in the target string. AutoIt regular expressions are normally case-sensitive.

Regular expressions are constructed of one or more of the following simple regular expression specifiers. If the character is not in the following table, then it will match only itself.

Repeating characters (*, +, ?, {...} ) will try to match the largest set possible, which allows the following characters to match as well, unless followed immediately by a question mark; then it will find the smallest pattern that allows the following characters to match as well.

Nested groups are allowed, but keep in mind that all the groups, except non-capturing groups, assign to the returned array, with the outer groups assigning after the inner groups.

Complete description can be found here:  http://www.autoitscript.com/autoit3/pcrepattern.html

Caution: bad regular expressions can produce a quasi infinite loop hogging the CPU even a crash.

Matching Characters

[ ... ]

Match any character in the set. e.g. [aeiou] matches any lower-case vowel. A contiguous set can be defined using a dash between the starting and ending characters. e.g. [a-z] matches any lower case character. To include a dash (-) in a set, use it as the first or last character of the set. To include a closing bracket in a set, use it as the first character of the set. e.g. [][] will match either [ or ]. Note that special characters do not retain their special meanings inside a set, with the exception of \\, \^, \-,\[ and \] match the escaped character inside a set.

[^ ... ]

Match any character not in the set. e.g. [^0-9] matches any non-digit. To include a caret (^) in a set, put it after the beginning of the set or escape it (\^).

[:class:]

Match a character in the given class of characters. Valid classes are: alpha (any alphabetic character), alnum (any alphanumeric character), lower (any lower-case letter), upper (any upper-case letter), digit (any decimal digit 0-9), xdigit (any hexadecimal digit, 0-9, A-F, a-f), space (any whitespace character), blank (only a space or tab), print (any printable character), graph (any printable character except spaces), cntrl (any control character [ascii 127 or <32]) or punct (any punctuation character). So [0-9] is equivalent to [[:digit:]].

[^:class:]

Match any character not in the class, but only if the first character.

( ... )

Group. The elements in the group are treated in order and can be repeated together. e.g. (ab)+ will match "ab" or "abab", but not "aba". A group will also store the text matched for use in back-references and in the array returned by the function, depending on flag value.

(?i)

Case-insensitivity flag. This does not operate as a group. It tells the regular expression engine to do case-insensitive matching from that point on.

(?-i)

(default) Case-sensitivity flag. This does not operate as a group. It tells the regular expression engine to do case-sensitive matching from that point on.

(?i ... )

Case-insensitive group. Behaves just like a normal group, but performs case-insensitive matches within the group.

(?-i ... )

Case-sensitive group. Behaves just like a normal group, but performs case-sensitive matches within the group. Primarily for use after (-i) flag or inside a case-insensitive group.

(?: ... )

Non-capturing group. Behaves just like a normal group, but does not record the matching characters in the array nor can the matched text be used for back-referencing.

(?i: ... )

Case-insensitive non-capturing group. Behaves just like a non-capturing group, but performs case-insensitive matches within the group.

(?-i: ... )

Case-sensitive non-capturing group. Behaves just like a non-capturing group, but performs case-sensitive matches within the group.

(?m)

^ and $ match newlines within data.

(?s)

. matches anything including newline. (by default "." don't match newline)

(?x)

Ignore whitespace and # comments.

(?U)

Invert greediness of quantifiers.

.

Match any single character (except newline).

|

Or. The expression on one side or the other can be matched.

\

Escape a special character (have it match the actual character) or introduce a special character type (see below).

\\

Match an actual backslash (\).

\a

Alarm, that is, the BEL character (chr(7)).

\A

Match only at beginning of string.

\b

Matches at a word boundary.

\B

Matches when not at a word boundary.

\c

Match a control character, based on the next character. For example, \cM matches ctrl-M.

\d

Match any digit (0-9).

\D

Match any non-digit.

\e

Match an escape character (chr(27)).

\E

end case modification.

\f

Match an formfeed character (chr(12)).

\h

any horizontal whitespace character.

\H

any character that is not a horizontal whitespace character.

\l

Match lowercase next char.

\L

Match lowercase till \E.

\n

Match a linefeed (@LF, chr(10)).

\Q

quote (disable) pattern metacharacters till \E.

\r

Match a carriage return (@CR, chr(13)).

\s

Match any whitespace character: Chr(9) through Chr(13) which are Horizontal Tab, Line Feed, Vertical Tab, Form Feed, and Carriage Return, and the standard space ( Chr(32) ).

\S

Match any non-whitespace character.

\t

Match a tab character (chr(9)).

\u

Match uppercase next char.

\U

Match uppercase till \E.

\v

any vertical whitespace character.

\V

any character that is not a vertical whitespace character

.

 

\w

Match any "word" character: a-z, A-Z, 0-9 or underscore (_).

\W

Match any non-word character.

\###

Match the ascii character whose code is given or back-reference. Can be up to 3 octal digits.
Match back-reference if found. Match the prior group number given exactly. For example, ([:alpha:])\1 would match a double letter.

\x##

Match the ascii character whose code is given in hexadecimal. Can be up to 2 digits.

\z

Match only at end of string.

\Z

Match only at end of string, or before newline at the end.


Repeating Characters

{x}

Repeat the previous character, set or group exactly x times.

{x,}

Repeat the previous character, set or group at least x times.

{0,x}

Repeat the previous character, set or group at most x times.

{x, y}

Repeat the previous character, set or group between x and y times, inclusive.

*

Repeat the previous character, set or group 0 or more times. Equivalent to {0,}

+

Repeat the previous character, set or group 1 or more times. Equivalent to {1,}

?

The previous character, set or group may or may not appear. Equivalent to {0, 1}

? (after a repeating character)

Find the smallest match instead of the largest.


Character Classes

[:alnum:]

letters and digits

[:alpha:]

letters

[:ascii:]

character codes 0 - 127

[:blank:]

space or tab only

[:cntrl:]

control characters

[:digit:]

decimal digits (same as \d)

[:graph:]

printing characters, excluding space

[:lower:]

lower case letters

[:print:]

printing characters, including space

[:punct:]

printing characters, excluding letters and digits

[:space:]

white space (not quite the same as \s, it include VT: chr(11) )

[:upper:]

upper case letters

[:word:]

"word" characters (same as \w)

[:xdigit:]

hexadecimal digits