Tuesday, November 8, 2011

Regular expressions simplify pattern-matching code - 4


Metacharacters

Although literal string regex constructs are useful, more powerful regex constructs combine literal characters with metacharacters. For example, in a.b, the period metacharacter (.) represents any character that appears between a and b. To see the period metacharacter in action, execute the following command line:
java RegexDemo .ox "The quick brown fox jumps over the lazy ox."

The command line above specifies .ox as the regex and The quick brown fox jumps over the lazy ox. as the text command-line argument. RegexDemo searches the text for matches that begin with any character and end with ox, and produces the following output:
Regex = .ox
Text = The quick brown fox jumps over the lazy ox.
Found fox
  starting at index 16 and ending at index 19
Found  ox
  starting at index 39 and ending at index 42

The output reveals two matches: fox and ox (with a leading space character). The . metacharacter matches the f in the first match and the space character in the second match.
What happens if we replace .ox with the period metacharacter? That is, what outputs when we specify java . "The quick brown fox jumps over the lazy ox."? Because the period metacharacter matches any character, RegexDemo outputs a match for each character in its text command-line argument, including the terminating period character.
Tip
To specify . or any metacharacter as a literal character in a regex construct, quote—convert from meta status to literal status—the metacharacter in one of two ways:
  • Precede the metacharacter with a backslash character.
  • Place the metacharacter between \Q and \E (e.g., \Q.\E).
In either scenario, don't forget to double each backslash character (as in \\. or \\Q.\\E) that appears in a string literal (e.g., String regex = "\\.";). Do not double the backslash character when it appears as part of a command-line argument.

No comments: