Sunday, November 6, 2011

Regular expressions simplify pattern-matching code - 2


Java's java.util.regex package supports pattern matching via its Pattern, Matcher, and PatternSyntaxException classes:
  • Pattern objects, also known as patterns, are compiled regexes
  • Matcher objects, or matchers, are engines that interpret patterns to locate matches in character sequences, objects whose classes implement the java.lang.CharSequence interface and serve as text sources
  • PatternSyntaxException objects describe illegal regex patterns

Listing 1 introduces those classes:
Listing 1. RegexDemo.java
// RegexDemo.java
import java.util.regex.*;
class RegexDemo
{
   public static void main (String [] args)
   {
      if (args.length != 2)
      {
          System.err.println ("java RegexDemo regex text");
          return;
      }
      Pattern p;
      try
      {
         p = Pattern.compile (args [0]);
      }
      catch (PatternSyntaxException e)
      {
         System.err.println ("Regex syntax error: " + e.getMessage ());
         System.err.println ("Error description: " + e.getDescription ());
         System.err.println ("Error index: " + e.getIndex ());
         System.err.println ("Erroneous pattern: " + e.getPattern ());
         return;
      }
      String s = cvtLineTerminators (args [1]);
      Matcher m = p.matcher (s);
      System.out.println ("Regex = " + args [0]);
      System.out.println ("Text = " + s);
      System.out.println ();
      while (m.find ())
      {
         System.out.println ("Found " + m.group ());
         System.out.println ("  starting at index " + m.start () +
                             " and ending at index " + m.end ());
         System.out.println ();
      }
   }
   // Convert \n and \r character sequences to their single character
   // equivalents
   static String cvtLineTerminators (String s)
   {
      StringBuffer sb = new StringBuffer (80);
      int oldindex = 0, newindex;
      while ((newindex = s.indexOf ("\\n", oldindex)) != -1)
      {
         sb.append (s.substring (oldindex, newindex));
         oldindex = newindex + 2;
         sb.append ('\n');
      }
      sb.append (s.substring (oldindex));
      s = sb.toString ();
      sb = new StringBuffer (80);
      oldindex = 0;
      while ((newindex = s.indexOf ("\\r", oldindex)) != -1)
      {
         sb.append (s.substring (oldindex, newindex));
         oldindex = newindex + 2;
         sb.append ('\r');
      }
      sb.append (s.substring (oldindex));
      return sb.toString ();
   }
}

RegexDemo's public static void main(String [] args) method validates two command-line arguments: one that identifies a regex and another that identifies text. After creating a pattern, this method converts all the text argument's new-line and carriage-return line-terminator character sequences to their actual meanings. For example, a new-line character sequence (represented as backslash (\) followed by n) converts to one new-line character (represented numerically as 10). After outputting the regex and converted text command-line arguments, main(String [] args) creates a matcher from the pattern, which subsequently finds all matches. For each match, the match's characters and information on where the match occurs in the text output to the standard output device.

No comments: