STfindToken


Routine

char *STfindToken (const char String[], const char Delims[], const char Quotes[], char Token[], int WSFlag, int Maxchar);

Purpose

Find the first token string in a string

Description

This routine finds the first token string in a given string. A token string is delimited by a character from a set of delimiter characters. If no delimiter is found, the token string is the entire input string. A flag determines how white-space characters are treated. Optionally, quote characters can be specified to allow separators to be treated as ordinary characters between paired quote characters.

The processing of white-space (as defined by isspace) is controlled by the WSFlag flag.

WSFlag = 0:
This mode treats white-space characters as ordinary characters.
WSFlag = 1:
This mode causes leading and trailing white-space in the token string to be stripped off before the token string is returned.
WSFlag = 2:
This mode causes white-space to serve as an additional delimiter. There are two subcases here. If the Delims string is empty, then only white-space serves as a delimiter. However leading and trailing white-space in the input string does not serve as a delimiter, i.e. only the characters after the initial white-space are returned as token characters. The second subcase occurs if the Delims string is not empty. Now the delimiter can be either white-space alone, or a character from the Delims string with optional white-space surrounding the delimiter character. The white-space serving as a delimiter or surrounding the delimiter character is stripped off before returning the token string.

Example:
  String = "XXX : yyy";
  Delims = ":";
  WSFlag = 1;
  STfindToken (String, Delims, Token, WSFlag, 100);
On return, Token contains the string "XXX".

Quote characters are specified in pairs, a left quote character and a right quote character for each pair. When multiple pairs of quote characters are specified, the appearance of the first left quote character disables the quoting interpretation of characters from other pairs of quote characters. While within the scope of this quote character, only the left quote and the right quote characters from the active pair will be interpreted as special characters.

Quoting strings:
The left and right quote characters are the same. After an initial left quote character, a second quote character is interpreted as a right quote character. This means that nested quotes cannot occur.
Examples:
1: |"def ghi"|. Quote characters |""| and white-space as a delimiter. The initial quote character '"' disables the interpretation of the blank in the string as a token delimiter. The string returned in the token includes the quote characters.
2: |"ab cd""ef gh"|. Quote characters |""| and white-space as a delimiter. This string would be returned in its entirety, including the quote characters.

Grouping expressions:
In this case, the left and right quote characters are different. Nesting of groups can occur.
Example:
|f1 (x, f2 (y,z)), f3(x)|. Quote characters |()|, |,| as a delimiter. The commas occurring inside the parentheses do not serve as delimiters. Also, nesting is observed, so that only the second |)| character matches the first |(|. The string returned is |f1 (x, f2 (y,z))|.

Notes:
Unmatched quote characters are not reported. The reporting of such errors is left to whatever routine interprets the token strings.

This routine always returns a Token string, even if it is of zero length. If the input is zero length, or all white-space (for WSFlag!= 0), such an input string may be interpreted as having no token string. To enforce such an interpretation, the calling routine should check for this case.

There is no escape mechanism to allow quote characters to appear in a string without acting as quote characters. Note however the behaviour described above, viz. once in quotes, quotes from other than the active quote pair are treated as ordinary characters.

A typical program snippet for using this routine is

  p = string;
  while (p != NULL) {
    p = STfindToken (p, Delims, Quotes, Token, WSFlag, Maxchar);
    ... process Token ...
  }

Parameters

<- char *STFindToken
Pointer to the character after the delimiter. This pointer is set to NULL if all tokens have been parsed from the input string. If not NULL, this value is suitable for feeding back into this routine to extract the next token string.
-> const char String[]
Input character string. If String is the NULL pointer, this routine returns NULL and sets Token to be the empty string.
-> const char Delims[]
Character string specifying delimiter characters. White-space may also act as a delimiter depending on the setting of the WSFlag flag. The Delims string can be zero length, indicating either that no delimiters are to be recognized, or that only white-space is to be recognized as a delimiter (if WSFlag is set appropriately).
-> const char Quotes[]
Character string specifying pairs of quote characters (the left and right quote characters). In the part of the input string between a matched pair of quote characters, any other characters, including quote characters other than from the active pair, are treated as ordinary characters. Up to 5 pairs of quote characters can be specified. A zero length string indicates that quote characters should not to be recognized.
<- char Token[]
Output token string. This string has at most Maxchar characters, not including the terminating null character. This string is always null terminated. The token string may have leading and trailing white-space stripped off if WSFlag is set appropriately. If the actual token string is more than Maxchar characters long, only Maxchar characters are written to Token, and a warning message is printed. Token can be occupy the same string space as String. In that case Token overwrites the beginning part of String.
-> int WSFlag
Flag controlling the interpretation of white-space characters.
  0 - White-space characters are treated as ordinary characters
  1 - Leading and trailing white-space in the token string are
      stripped off.
  2 - White-space serves as an additional delimiter
-> int Maxchar
Maximum number of characters (not including the trailing null character) to be placed in Token.

Author / revision

P. Kabal / Revision 1.36 2003/05/09

See Also

STkeyMatch, STkeyXpar


Main Index libtsp