Predefined character classes
Some character classes occur often enough in regexes to warrant shortcuts.
Pattern
provides such shortcuts with predefined character classes, which Table 1 presents. Use predefined character classes to simplify your regexes and minimize regex syntax errors.
Table 1. Predefined character classes
Predefined character class | Description |
\d
| A digit. Equivalent to [0-9] . |
\D
| A nondigit. Equivalent to [^0-9] . |
\s
| A whitespace character. Equivalent to [ \t\n\x0B\f\r] . |
\S
| A nonwhitespace character. Equivalent to [^\s] . |
\w
| A word character. Equivalent to [a-zA-Z_0-9] . |
\W
| A nonword character. Equivalent to [^\w] . |
|
The following command-line example uses the
\w
predefined character class to identify all word characters in its text command-line argument:
java RegexDemo \w "aZ.8 _"
The command line above produces the following output, which shows that the period and space characters are not considered word characters:
Regex = \w
Text = aZ.8 _
Found a
starting at index 0 and ending at index 1
Found Z
starting at index 1 and ending at index 2
Found 8
starting at index 3 and ending at index 4
Found _
starting at index 5 and ending at index 6
Note |
Pattern 's SDK documentation refers to the period metacharacter as a predefined character class that matches any character except for a line terminator—a one- or two-character sequence identifying the end of a text line—unless dotall mode (discussed later) is in effect. Pattern recognizes the following line terminators:
- The carriage-return character (
\r\ )
- The new-line (line feed) character (
\n )
- The carriage-return character immediately followed by the new-line character (
\r\n )
- The next-line character (
\u0085 )
- The line-separator character (
\u2028 )
- The paragraph-separator character (
\u2029 )
|
No comments:
Post a Comment