Predefined character classes
Some character classes occur often enough in regexes to warrant shortcuts.
Pattern provides such shortcuts with predefined character classes, which Table 1 presents. Use predefined character classes to simplify your regexes and minimize regex syntax errors.
Table 1. Predefined character classes
| Predefined character class | Description |
| \d
| A digit. Equivalent to [0-9]. |
| \D
| A nondigit. Equivalent to [^0-9]. |
| \s
| A whitespace character. Equivalent to [ \t\n\x0B\f\r]. |
| \S
| A nonwhitespace character. Equivalent to [^\s]. |
| \w
| A word character. Equivalent to [a-zA-Z_0-9]. |
| \W
| A nonword character. Equivalent to [^\w]. |
|
The following command-line example uses the
\w predefined character class to identify all word characters in its text command-line argument:
java RegexDemo \w "aZ.8 _"
The command line above produces the following output, which shows that the period and space characters are not considered word characters:
Regex = \w
Text = aZ.8 _
Found a
starting at index 0 and ending at index 1
Found Z
starting at index 1 and ending at index 2
Found 8
starting at index 3 and ending at index 4
Found _
starting at index 5 and ending at index 6
| Note |
| Pattern's SDK documentation refers to the period metacharacter as a predefined character class that matches any character except for a line terminator—a one- or two-character sequence identifying the end of a text line—unless dotall mode (discussed later) is in effect. Pattern recognizes the following line terminators:
- The carriage-return character (
\r\)
- The new-line (line feed) character (
\n)
- The carriage-return character immediately followed by the new-line character (
\r\n)
- The next-line character (
\u0085)
- The line-separator character (
\u2028)
- The paragraph-separator character (
\u2029)
|
No comments:
Post a Comment