Tuesday, December 27, 2011

Java's character and assorted string classes support text-processing - 27


Earlier, I cautioned you against relying on countTokens() for determining the number of tokens to extract. countTokens()'s return value is often meaningless when a program dynamically changes a StringTokenizer's delimiters with a nextToken(String delim) method call, as the following fragment demonstrates:
String record = "Ricard Santos,Box 99,'Sacramento,CA'";
StringTokenizer st = new StringTokenizer (record, ",");
int ntok = st.countTokens ();
System.out.println ("Number of tokens = " + ntok);
for (int i = 0; i < ntok; i++)
{
     String token = st.nextToken ();
     System.out.println (token);
     if (token.startsWith ("Box"))
         st.nextToken ("'"); // Throw away comma between Box 99 and
                             // 'Sacramento,CA'
}

The code creates a String that simulates a database record. Within that record, commas delimit fields (record portions). Although there are four commas, only three fields exist: a name, a box number, and a city-state. A pair of single quotes surround the city-state field to indicate that the comma between Sacramento and CA is part of the field.
After creating a StringTokenizer recognizing only comma characters as delimiters, the current thread counts the number of tokens, which subsequently print. The thread then uses that count to control the duration of the loop that extracts and prints tokens. When the Box 99 token returns, the thread executes st.nextToken ("'"); to change the delimiter from a comma to a single quote and discard the comma token between Box 99 and 'Sacramento,CA'. The comma token returns because st.nextToken ("'"); first replaces the comma with a single quote before extracting the next token. The code produces this output:
Number of tokens = 4
Ricard Santos
Box 99
Sacramento,CA
Exception in thread "main" java.util.NoSuchElementException
        at java.util.StringTokenizer.nextToken(StringTokenizer.java:232)
        at STDemo.main(STDemo.java:18)

The output indicates four tokens because three commas imply four tokens. But after displaying three tokens, a NoSuchElementException object is thrown from st.nextToken ();. The exception occurs because the program assumes that countTokens()'s return value indicates the exact number of tokens to extract. However, countTokens() can only base its count on the current set of delimiters. Because the fragment changes those delimiters during the loop, via st.nextToken ("'");, method countTokens()'s return value is no longer valid.
Caution
Do not use countTokens()'s return value to control a string tokenization loop's duration if the loop changes the set of delimiters via a nextToken(String delim) method call. Failure to heed that advice often leads to one of the nextToken() methods throwing a NoSuchElementException object and the program terminating prematurely.

No comments: