General Utility Library for C++14  2.11
Functions
gul14/tokenize.h

Detailed Description

Split a string into substrings.

Functions

template<typename StringContainer = std::vector<std::string>, typename ContainerInsertFct = void (*)(StringContainer&, string_view)>
StringContainer gul14::tokenize (string_view str, string_view delimiters=default_whitespace_characters, ContainerInsertFct insert_fct=detail::emplace_back< StringContainer >)
 Split the given string into a vector of substrings (tokens) delimited by any of the characters in the delimiters string. More...
 
template<typename StringContainer = std::vector<string_view>, typename ContainerInsertFct = void (*)(StringContainer&, string_view)>
StringContainer gul14::tokenize_sv (string_view str, string_view delimiters=default_whitespace_characters, ContainerInsertFct insert_fct=detail::emplace_back< StringContainer >)
 Split the given string into a vector of substrings (tokens) delimited by any of the characters in the delimiters string. More...
 

Function Documentation

◆ tokenize()

template<typename StringContainer = std::vector<std::string>, typename ContainerInsertFct = void (*)(StringContainer&, string_view)>
StringContainer gul14::tokenize ( string_view  str,
string_view  delimiters = default_whitespace_characters,
ContainerInsertFct  insert_fct = detail::emplace_back<StringContainer> 
)
inline

Split the given string into a vector of substrings (tokens) delimited by any of the characters in the delimiters string.

Multiple adjacent delimiters are treated like a single one, and delimiters at the beginning and end of the string are ignored. For example, tokenize(" A B C ") yields a vector with the three entries "A", "B", and "C".

// Default return type std::vector<std::string>
auto tokens = tokenize(" Hello world ");
assert(tokens.size() == 2);
assert(tokens[0] == "Hello");
assert(tokens[1] == "world");
StringContainer tokenize(string_view str, string_view delimiters=default_whitespace_characters, ContainerInsertFct insert_fct=detail::emplace_back< StringContainer >)
Split the given string into a vector of substrings (tokens) delimited by any of the characters in the...
Definition: tokenize.h:117

This function returns a std::vector<std::string> by default, but a compatible container for string/string_view types can be specified via a template parameter:

Template Parameters
StringContainerA container for strings or string_view-like types, e.g. std::vector<std::string> or std::list<gul14::string_view>
ContainerInsertFctType for the insert_fct function parameter.
Parameters
strThe string to be split.
delimitersString with delimiter characters. Any of the characters in this string marks the beginning/end of a token. By default, a wide variety of whitespace and control characters is used.
insert_fctBy default, tokenize() calls the emplace_back() member function on the container to insert strings. This parameter may contain a different function pointer or object with the signature void f(StringContainer&, gul14::string_view) that is called instead. This can be useful for containers that do not provide emplace_back() or for other customizations.
Returns
a container with the single substrings.
// Return string_views instead of strings (like tokenize_sv())
auto parts1 = tokenize<std::vector<gul14::string_view>>("Hello world");
assert(parts1.size() == 2);
assert(parts1[0] == "Hello");
assert(parts1[1] == "world");
// Use a different container that provides emplace_back()
auto parts2 = tokenize<gul14::SmallVector<gul14::string_view, 3>>("a-b-c", "-");
assert(parts2.size() == 3);
assert(parts2[0] == "a");
assert(parts2[1] == "b");
assert(parts2[2] == "c");
// Use a different container with a custom inserter function
using WeirdContainer = std::queue<std::string>;
auto inserter = [](WeirdContainer& c, gul14::string_view sv) { c.emplace(sv); };
auto parts3 = tokenize<WeirdContainer>("a.b", ".", inserter);
assert(parts3.size() == 2);
assert(parts3.front() == "a");
assert(parts3.back() == "b");
Note
tokenize() does not assume a specific encoding for its input strings, but operates on individual chars. This can have surprising effects in code such as this:
auto words = tokenize("Hörgeräteakkustiker hätten es gewußt", "ä");
assert(words.size() == 3); // Might fail or succeed depending on the encoding
See also
gul14::tokenize_sv() returns a vector<string_view> by default, gul14::split() uses a different approach to string splitting.
Since
GUL version 2.5, the return type of split() can be specified as a template parameter and a custom inserter can be specified (it always returned std::vector<std::string> before).

◆ tokenize_sv()

template<typename StringContainer = std::vector<string_view>, typename ContainerInsertFct = void (*)(StringContainer&, string_view)>
StringContainer gul14::tokenize_sv ( string_view  str,
string_view  delimiters = default_whitespace_characters,
ContainerInsertFct  insert_fct = detail::emplace_back<StringContainer> 
)
inline

Split the given string into a vector of substrings (tokens) delimited by any of the characters in the delimiters string.

This function is identical to tokenize(string_view, string_view, ContainerInsertFct) except that it returns a std::vector of string_views instead of strings by default:

auto tokens = tokenize_sv("hello world", " "); // Return type is std::vector<gul14::string_view>
assert(tokens.size() == 2);
assert(tokens[0] == "hello");
assert(tokens[1] == "world");
StringContainer tokenize_sv(string_view str, string_view delimiters=default_whitespace_characters, ContainerInsertFct insert_fct=detail::emplace_back< StringContainer >)
Split the given string into a vector of substrings (tokens) delimited by any of the characters in the...
Definition: tokenize.h:171
See also
gul14::tokenize() returns a vector<string> by default, gul14::split() uses a different approach to string splitting.
Since
GUL version 2.5, the return type of tokenize_sv() can be specified as a template parameter and a custom inserter can be specified (it always returned std::vector<gul14::string_view> before).