Split the given string into a vector of substrings (tokens) delimited by any of the characters in the delimiters string.
Multiple adjacent delimiters are treated like a single one, and delimiters at the beginning and end of the string are ignored. For example, tokenize(" A B C ") yields a vector with the three entries "A", "B", and "C".
auto tokens =
tokenize(
" Hello world ");
assert(tokens.size() == 2);
assert(tokens[0] == "Hello");
assert(tokens[1] == "world");
StringContainer tokenize(string_view str, string_view delimiters=default_whitespace_characters, ContainerInsertFct insert_fct=detail::emplace_back< StringContainer >)
Split the given string into a vector of substrings (tokens) delimited by any of the characters in the...
Definition: tokenize.h:117
This function returns a std::vector<std::string>
by default, but a compatible container for string/string_view types can be specified via a template parameter:
- Template Parameters
-
StringContainer | A container for strings or string_view-like types, e.g. std::vector<std::string> or std::list<gul14::string_view> |
ContainerInsertFct | Type for the insert_fct function parameter. |
- Parameters
-
str | The string to be split. |
delimiters | String with delimiter characters. Any of the characters in this string marks the beginning/end of a token. By default, a wide variety of whitespace and control characters is used. |
insert_fct | By default, tokenize() calls the emplace_back() member function on the container to insert strings. This parameter may contain a different function pointer or object with the signature void f(StringContainer&, gul14::string_view) that is called instead. This can be useful for containers that do not provide emplace_back() or for other customizations. |
- Returns
- a container with the single substrings.
auto parts1 = tokenize<std::vector<gul14::string_view>>("Hello world");
assert(parts1.size() == 2);
assert(parts1[0] == "Hello");
assert(parts1[1] == "world");
auto parts2 = tokenize<gul14::SmallVector<gul14::string_view, 3>>("a-b-c", "-");
assert(parts2.size() == 3);
assert(parts2[0] == "a");
assert(parts2[1] == "b");
assert(parts2[2] == "c");
using WeirdContainer = std::queue<std::string>;
auto parts3 = tokenize<WeirdContainer>("a.b", ".", inserter);
assert(parts3.size() == 2);
assert(parts3.front() == "a");
assert(parts3.back() == "b");
- Note
- tokenize() does not assume a specific encoding for its input strings, but operates on individual
char
s. This can have surprising effects in code such as this: auto words =
tokenize(
"Hörgeräteakkustiker hätten es gewußt",
"ä");
assert(words.size() == 3);
- See also
- gul14::tokenize_sv() returns a vector<string_view> by default, gul14::split() uses a different approach to string splitting.
- Since
- GUL version 2.5, the return type of split() can be specified as a template parameter and a custom inserter can be specified (it always returned std::vector<std::string> before).