By default all whitespace functions consider space, tab, carriage return and line feed characters as whitespace. For all whitespace functions, you can pass your own whitespace character set to override the default. All whitespace functions below use a fast algorithm to identify tokens so passing in large whitespace sets is not a performance issue.
inline void strip(const char* whitespace_set=0); inline void lstrip(const char* whitespace_set=0); inline void rstrip(const char* whitespace_set=0);
These three functions remove whitespace from the left, right, and both
sides of a string. For performance, none of these functions reallocate the string
buffer. Use compress()
to reduce the size of the string buffer,
if this is important.
Str x = " Hello There "; Str y = x; y.strip(); // y = "Hello There" y = x; y.lstrip(); // y = "Hello There " y = x; y.rstrip(); // y = " Hello There" y = x; y.strip(" He"); // y = "llo Ther"
The countTokens()
function returns the number of tokens contained
in a string. Tokens are defined as substrings that were originally
separated by whitespace.
Str x = " Hello There "; x.countTokens(); // returns 2 ("Hello" and "There" are tokens) x.countTokens('T'); // returns 2 (" Hello " and "here " are tokens) x.countTokens('e '); // returns 4 ("H", "llo", "Th", and "r" are tokens)
long next_idx = copyToken(const Str& str, long start_idx=0, const char* white_space_set=0);
The copyToken()
function will copy a token from str
to the
string calling the function. The token copied is the first token found
in str
at or after start_idx
. The return value of
copyToken
is the next character index in str
after the
token copied. The copyToken()
function is designed to be called
iteratively, passing the return value directly into the next call. When
there are no tokens, copyToken()
returns a -1.
Str x = " How are you? "; Str y; long idx = 0; idx = y.copyToken(x, idx); // y = "How", idx = 5 idx = y.copyToken(x, idx); // y = "are", idx = 11 idx = y.copyToken(x, idx); // y = "you?", idx = 16 idx = y.copyToken(x, idx); // y = "", idx = -1 idx = y.copyToken(x, idx); // y = "", idx = -1
As the example shows, it is safe to call copyToken()
, even after
exhausting all available tokens. This allow you to safely write code
like this:
// Get the third token from str2, if it exists int str_idx = 0; for (int idx=0; idx < 3; ++idx) { str_idx = str.copyToken(str2, str_idx); } if (str.length()) { // There was a third token }
Str* token_array = getAllTokens(unsigned int& token_count, const char* white_space_set=0) const;
If you wanted an array of every token in a string, you might write code like this:
Str x = "Hello, how are you?"; unsigned long token_count = x.countTokens(); Str* y = new Str[token_count]; unsigned long str_idx = 0; for (int idx = 0; idx < token_count; ++idx) { str_idx = y[idx].copyToken(x, str_idx); } // work with y... delete[] y;
The getAllTokens()
function allows this same process to happen with
less variables and typing on your part. As an added bonus, getAllTokens()
will
execute slightly faster too. Here is the above example rewritten:
Str x = "Hello, how are you?"; unsigned long token_count; Str* y = x.getAllTokens(token_count); // work with y... delete[] y;
Note that, in the example above, getAllTokens()
sets the value of
token_count
. This is why it is fine to pass in
token_count
uninitialized.