A practical application of regexes
Regexes let you create powerful text-processing applications. One application you might find helpful extracts comments from a Java, C, or C++ source file, and records those comments in another file. Listing 2 presents that application's source code:Listing 2. ExtCmnt.java
// ExtCmnt.java
import java.io.*;
import java.util.regex.*;
class ExtCmnt
{
public static void main (String [] args)
{
if (args.length != 2)
{
System.err.println ("usage: java ExtCmnt infile outfile");
return;
}
Pattern p;
try
{
// The following pattern lets this extract multiline comments that
// appear on a single line (e.g., /* same line */) and single-line
// comments (e.g., // some line). Furthermore, the comment may
// appear anywhere on the line.
p = Pattern.compile (".*/\\*.*\\*/|.*//.*$");
}
catch (PatternSyntaxException e)
{
System.err.println ("Regex syntax error: " + e.getMessage ());
System.err.println ("Error description: " + e.getDescription ());
System.err.println ("Error index: " + e.getIndex ());
System.err.println ("Erroneous pattern: " + e.getPattern ());
return;
}
BufferedReader br = null;
BufferedWriter bw = null;
try
{
FileReader fr = new FileReader (args [0]);
br = new BufferedReader (fr);
FileWriter fw = new FileWriter (args [1]);
bw = new BufferedWriter (fw);
Matcher m = p.matcher ("");
String line;
while ((line = br.readLine ()) != null)
{
m.reset (line);
if (m.matches ()) /* entire line must match */
{
bw.write (line);
bw.newLine ();
}
}
}
catch (IOException e)
{
System.err.println (e.getMessage ());
return;
}
finally // Close file.
{
try
{
if (br != null)
br.close ();
if (bw != null)
bw.close ();
}
catch (IOException e)
{
}
}
}
}
Pattern
and Matcher
objects, ExtCmnt
reads a text file's contents line by line. For each read line, the matcher attempts to match that line against a pattern, identifying either a single-line comment or a multiline comment that appears on a single line. If the line matches the pattern, ExtCmnt
writes that line to another text file. For example, java ExtCmnt ExtCmnt.java out
reads each ExtCmnt.java
line, attempts to match that line against the pattern, and outputs matched lines to a file named out
. (Don't worry about understanding the file reading and writing logic. I will explore that logic in a future article.) After ExtCmnt
completes, out
contains the following lines: // ExtCmnt.java
// The following pattern lets this extract multiline comments that
// appear on a single line (e.g., /* same line */) and single-line
// comments (e.g., // some line). Furthermore, the comment may
// appear anywhere on the line.
p = Pattern.compile (".*/\\*.*\\*/|.*//.*$");
if (m.matches ()) /* entire line must match */
finally // Close file.
ExtCmnt
is not perfect: p = Pattern.compile (".*/\\*.*\\*/|.*//.*$");
doesn't represent a comment. That line appears in out
because ExtCmnt
's matcher matches the //
characters. There is something interesting about the pattern in
".*/\\*.*\\*/|.*//.*$"
: the vertical bar metacharacter (|
). According to the SDK documentation, the parentheses metacharacters in a capturing group and the vertical bar metacharacter are logical operators. The vertical bar tells a matcher to use that operator's left regex construct operand to locate a match in the matcher's text. If no match exists, the matcher uses that operator's right regex construct operand in another match attempt.
No comments:
Post a Comment