next up previous contents
Next: Find & Replace Up: Parsing and Formatting Previous: Extracting Substrings   Contents

Subsections

Whitespace Functions

By default all whitespace functions consider space, tab, carriage return and line feed characters as whitespace. For all whitespace functions, you can pass your own whitespace character set to override the default. All whitespace functions below use a fast $O(n+m)$ algorithm to identify tokens so passing in large whitespace sets is not a performance issue.

lstrip, rstrip, strip

inline void  strip(const char* whitespace_set=0);
inline void lstrip(const char* whitespace_set=0);
inline void rstrip(const char* whitespace_set=0);

These three functions remove whitespace from the left, right, and both sides of a string. For performance, none of these functions reallocate the string buffer. Use compress() to reduce the size of the string buffer, if this is important.

Str x = "   Hello There   ";
Str y = x;
y.strip();      // y = "Hello There"
y = x;
y.lstrip();     // y = "Hello There   "
y = x;
y.rstrip();     // y = "   Hello There"
y = x;
y.strip(" He"); // y = "llo Ther"

countTokens



The countTokens() function returns the number of tokens contained in a string. Tokens are defined as substrings that were originally separated by whitespace.

Str x = "   Hello There   ";
x.countTokens();     // returns 2 ("Hello" and "There" are tokens)
x.countTokens('T');  // returns 2 ("   Hello " and "here   " are tokens)
x.countTokens('e '); // returns 4 ("H", "llo", "Th", and "r" are tokens)

copyToken

long next_idx = copyToken(const Str& str, long start_idx=0, 
                          const char* white_space_set=0);

The copyToken() function will copy a token from str to the string calling the function. The token copied is the first token found in str at or after start_idx. The return value of copyToken is the next character index in str after the token copied. The copyToken() function is designed to be called iteratively, passing the return value directly into the next call. When there are no tokens, copyToken() returns a -1.

Str x = "  How   are you?  ";
Str y;
long idx = 0;
idx = y.copyToken(x, idx);  // y = "How",  idx = 5
idx = y.copyToken(x, idx);  // y = "are",  idx = 11
idx = y.copyToken(x, idx);  // y = "you?", idx = 16
idx = y.copyToken(x, idx);  // y = "",     idx = -1
idx = y.copyToken(x, idx);  // y = "",     idx = -1

As the example shows, it is safe to call copyToken(), even after exhausting all available tokens. This allow you to safely write code like this:

// Get the third token from str2, if it exists
int str_idx = 0;
for (int idx=0; idx < 3; ++idx) { str_idx = str.copyToken(str2, str_idx); }
if (str.length()) {
    // There was a third token
}

getAllTokens

Str* token_array = getAllTokens(unsigned int& token_count, 
                                const char* white_space_set=0) const;

If you wanted an array of every token in a string, you might write code like this:

Str x = "Hello, how are you?";

unsigned long token_count = x.countTokens();
Str* y = new Str[token_count];
unsigned long str_idx = 0;
for (int idx = 0; idx < token_count; ++idx) {
    str_idx = y[idx].copyToken(x, str_idx);
}

// work with y...

delete[] y;

The getAllTokens() function allows this same process to happen with less variables and typing on your part. As an added bonus, getAllTokens() will execute slightly faster too. Here is the above example rewritten:

Str x = "Hello, how are you?";

unsigned long token_count;
Str* y = x.getAllTokens(token_count);

// work with y...

delete[] y;

Note that, in the example above, getAllTokens() sets the value of token_count. This is why it is fine to pass in token_count uninitialized.


next up previous contents
Next: Find & Replace Up: Parsing and Formatting Previous: Extracting Substrings   Contents
2007-05-05