Sunday, December 25, 2011

Java's character and assorted string classes support text-processing - 25


Among Editor's various commands, add appends a line of text to the StringBuffer strings array, dump dumps all lines to the standard output device, and delfch removes the current line's first character. Obviously, delfch is not very useful: a better program would specify an index after the command name and delete the character at that index. However, before you can accomplish that task, you must learn about the StringTokenizer class.
The StringTokenizer class
What do the Java compiler, a text-based adventure game, and a Linux shell program have in common? Each program contains code that extracts, from user-specified text, the fundamental character sequences, or tokens, such as identifiers and punctuation (compiler), game-play instructions (adventure game), or command name and arguments (Linux shell). Java accomplishes the token extraction process—known as string tokenizing because user-specified text exists as one or more character strings— via the StringTokenizer class.
Unlike the frequently-used Character, String, and StringBuffer language classes, the less-frequently-used StringTokenizer utility class exists in package java.util and requires an explicit import directive to import that class into a program.
StringTokenizer objects
Before a program can extract tokens from a string, the program must create a StringTokenizer object by calling one of the following constructors:
  • public StringTokenizer(String s), which creates a StringTokenizer that extracts tokens from the s-referenced String. Furthermore, the constructor specifies the space character (' '), tab character ('\t'), new-line character ('\n'), carriage-return character ('\r'), and form-feed character ('\f') as delimiters—characters that separate tokens from each other. Delimiters do not return as tokens.
  • public StringTokenizer(String s, String delim), which is identical to the previous constructor except you also specify a string of delimiter characters via the delim-referenced String. During string tokenizing, StringTokenizer ignores all delimiter characters as it searches for the next token's beginning. Delimiters do not return as tokens.
  • public StringTokenizer(String s, String delim, boolean returnDelim), which resembles the previous constructors except you also specify whether delimiter characters should return as tokens. Delimiter characters return when you pass true to returnDelim.

Examine the following fragment to learn how these constructors create StringTokenizer objects:
String s = "A sentence to tokenize.|A second sentence.";
StringTokenizer stok1 = new StringTokenizer (s);
StringTokenizer stok2 = new StringTokenizer (s, "|");
StringTokenizer stok3 = new StringTokenizer (s, " |", true);

stok1 references a StringTokenizer that extracts tokens from the s-referenced String—and also recognizes space, tab, new-line, carriage-return, and form-feed characters as delimiters. stok2 references a StringTokenizer that also extracts tokens from s. This time, however, only a vertical bar character (|) classifies as a delimiter. Finally, in the stok3-referenced StringTokenizer, the white space and vertical bar classify as delimiters and return as tokens. Now that these StringTokenizers exist, how do you extract tokens from their s-referenced Strings? Let's find out.

No comments: