C++17: std::string_view

C++17 shipped with a very useful non-owning read-only string handling class template: std::basic_string_view<CharT, CharTraits> with several instantiations mirroring the std::basic_string class template: std::string_view, std::wstring_view, std::u8string_view, std::u16string_view, and std::u32string_view.

A std::string_view instance contains only a pointer to the first character in a char array and the length of the string. The std::string_view does not own such array and thus, it is not responsible for the lifetime of the text it refers to.

Since it contains the length of the string it represents, the string does not need to end with a '\0' character. This feature (the non-‘\0’ terminated string) enables a lot of interesting features like creating substring views by far faster than its std::string counterpart (since creating a substring means just updating the pointer and the length of the new std::string_view instance).

How a std::string_view is instantiated?

std::string_view today = "Monday";  // string_view constructed from a const char*
std::string aux = "Tuesday";
std::string_view tomorrow = aux;  // string_view constructed from a std::string
std::string_view day_after_tomorrow = "Wednesday"sv;  // string_view constructed from a std::string_view
constexpr std::string_view day_before_friday = "Thursday"; // compile-time string_view

How can I use a std::string_view?

A lot of classes from the C++ standard library have been updated to use the std::string_view as a first class citizen, so, for example, to print out a std::string_view instance, you could simply use the std::cout as usual:

std::string_view month = "September";
std::cout << "This month is " << month << "\n";

As I mentioned earlier, you can use the std::string_view to execute several actions in the underlying text with a better performance, for example, to create substrings:

#include <string_view>
#include <iostream>

int main()
{

    std::string_view fullName = "García Márquez, Gabriel";

    // I need to separate first name and last name

    // I find the index of the ","
    auto pos = fullName.find(",");
    if (pos == std::string_view::npos) // NOT FOUND
    {
        std::cerr << "Malformed full name\n";
        return -1;
    }

    std::string_view lastName = fullName.substr(0, pos);
    std::string_view firstName = fullName.substr(pos + 2);

    std::cout << "FIRST NAME: " << firstName;
    std::cout << "\nLAST NAME:  " << lastName << "\n";

    return 0;
}

To verify if a text starts or ends with something (starts_with() and ends_with() were introduced in C++20):

std::string_view price = "125$";
if (price.endsWith("$"))
{
    price.remove_suffix(1); // removes 1 char at the end of the string_view
    std::cout << price << " dollar\n";
}

It exposes a lot of functionality similar to the std::string class, so its usage is almost the same.

How can I get the content of a std::string_view?

  • If you want to iterate the elements, you can use operator[], at() or iterators in the same way that you do it in strings.
  • If you want to get the underlying char array, you can access to it through the data() method. Be aware the underlying char array can miss the null character as mentioned above.
  • If you want to get a valid std::string from a std::string_view, there is no automatic conversion from one to other, so you need to use a explicit std::string constructor:
std::string_view msg = "This will be a string";
std::string msg_as_string { msg };

std::cout << msg_as_string << "\n";

Handle with care

And though probably you are already imagining the scenarios a string_view can be used into your system, take into account that it could bring some problems because of its non-owning nature, for example this code:

std::string_view say_hi()
{
  std::string_view hi = "Guten Tag";
  return hi;
}

Is a completely valid and correct usage of a std::string_view.

But this slightly different version:

std::string_view say_hi()
{
  std::string_view hi = "Guten Tag"s;
  return hi;
}

Returns a corrupt string and the behavior for this is undefined. Notice the “s” at the end of the literal. That means “Guten Tag” will be a std::string and not a const char*. Being a std::string means a local instance will be created in that function and the std::string_view refers to the underlying char array in that instance. BUT, since it is a local instance, it will be destroyed when it goes out of scope, destroying the char array and returning a dangling pointer inside the std::string_view.

Sutile, but probably hard to find and debug.

To read more about std::string_view, consult the ultimate reference:

https://en.cppreference.com/w/cpp/string/basic_string_view

Advertisement

C++20: {fmt} and std::format

In this post I will show a very nice open source library that lets the programmers create formatted text in a very simple, type-safe and translatable way: {fmt}

{fmt} is also the implementation of reference for the C++20 standard std::format (the {fmt}‘s author [https://www.zverovich.net/] submitted his paper to the standard committee) but until this moment (June, 2021) only Microsoft Visual Studio implements it. So I will describe the standalone library for now.

These are the main features we can find in {fmt}:

  • We can specify the format and the variables to be formatted, similar to C printf family.
  • The {fmt} functions do not require the programmer to match the type of the format specifier to the actual values the are being formatted.
  • With {fmt}, programmers can optionally specify the order of values to be formatted. This feature is very very useful when internationalizing texts where the order of the parameters is not always the same as in the original language.
  • Some format validations can be done in compile time.
  • It is based on variadic templates but there are lower-level functions on this library that fail over to C varargs.
  • Its performance its way ahead to C++ std::ostream family.

To test {fmt}, I used Godbolt’s Compiler Explorer.

Hello world

This is a very simple “Hello world” program using {fmt}:

#include <fmt/core.h>

int main()
{
    fmt::print("Hello world using {{fmt}}\n");
}

fmt::print() is a helper function that prints formatted text using the standard output (similar to what std::cout does).

As you can see, I did not specify any variables to be formatted BUT I wrote {{fmt}} instead of {fmt} because {{ and }} are, for the {fmt} functions, escape sequences for { and } respectively, which when used as individual characters, contain a format specifier, an order sequence, or simply mean that will be replaced by a parameterized value.

Using simple variable replacements

For example I want to print out two values, I can do something like:

int age = 16;
std::string name = "Ariana";

fmt::print("Hello, my name is {} and I'm {} years old", name, age);

This will print:

Hello, my name is Ariana and I'm 16 years old.

fmt::print() replaced the {} with the respective variable values. Notice also that I used std::string and it worked seamlessly. Using printf(), the programmers should get the const char* representation of the string to be displayed.

If the programmers specify less variables than {}, the compiler (or runtime) returns an error. More on this below.

If the programmers specify more variables than {}, the extra ones are simply ignored by fmt::print() or fmt::format().

Defining arguments order

The programmers can also specify the order the parameters will be replaced:

void show_numbers_name(bool ascending)
{
    std::string_view format_spec = ascending ? "{0}, {1}, {2}\n" : "{2}, {1}, {0}\n";
    
    auto one = "one";
    auto two = "two";
    auto three = "three";

    fmt::print(format_spec, one, two, three);
}

int main()
{
    show_numbers_name(true);
    show_numbers_name(false);
}

As you can see, the {} can contain a number that tells {fmt} the number of the argument that will be used in such position when formatting the output.

Formatting containers

Containers can easily printed out including #include <fmt/ranges.h>

std::vector<int> my_vec = { 10, 20, 30, 40, 50 };
fmt::print("{}\n", my_vec);

Format errors

There are two ways that {fmt} uses to process errors:

If there is an error in the format, a runtime exception is thrown, for example:

int main()
{
    try
    {
        fmt::print("{}, {}, {}\n", 1, 2);
    }
    catch (const fmt::format_error& ex)
    {
        fmt::print("Exception: {}\n", ex.what());
    }
}

In the example above, I say that I have three parameters but only provided two variables, so a fmt::format_error exception is thrown.

But if the format specifier is always constant, we can specify the format correctness in runtime in this way:

#include <fmt/core.h>
#include <fmt/compile.h>

int main()
{
    fmt::print(FMT_COMPILE("{}, {}, {}\n"), 1, 2);
}

FMT_COMPILE is a macro found in <fmt/compile.h> that performs the format validation in compile-time, thus in this case, a compile-time error is produced.

Custom types

To format your custom types, you must to create a template specialization of the class template fmt::formatter and implement the methods parse() and format(), as in this example:

// My custom type
struct person
{
    std::string first_name;
    std::string last_name;
    size_t social_id;
};

// fmt::formatter full template specialization
template <>
struct fmt::formatter<person>
{
    // Parses the format specifier, if needed (in my case, only return an iterator to the context)
    constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }

    // Actual formatting. The second parameter is the format specifier and the next parameters are the actual values from my custom type
    template <typename FormatContext>
    auto format(const person& p, FormatContext& ctx) {
        return format_to(
            ctx.out(), 
            "[{}] {}, {}", p.social_id, p.last_name, p.first_name);
    }
};

int main()
{
    fmt::print("User: {}\n", person { "Juana", "Azurduy", 23423421 });
}

Neat, huh?

Links

{fmt} GitHub page: https://github.com/fmtlib/fmt

{fmt} API reference: https://fmt.dev/latest/api.html

Compiler explorer: https://godbolt.org/

C++17: std::optional

The C++17 standard library ships with a very interesting class template: std::optional<T>.

The idea behind it is to make explicit the fact that a variable can hold or not an actual value.

Before the existence of std::optional<T>, the only way to implement such semantics was through pointers or tagged unions (read about C++17 std::variant here).

For example, if I want to declare a struct person that stores a person’s first name, last name and nickname; and since not all people have or not a nickname, I would have to implement that (in older C++) in this way:

struct person
{
  std::string first_name;
  std::string last_name;
  std::string* nickname; //no nickname if null
};

To make explicit that the nickname will be optional, I need to write a comment stating that “null” represents “no nickname” in this scenario.

And it works, but:

  • It is error prone because the user can easily do something like: p.nickname->length(); and run into an unexpected behavior when the nickname is null.
  • Since the nickname will be stored as a pointer, the instance needs to be created in heap, adding one indirection level and one additional dynamic allocation/deallocation only to support the desired behavior (or the programmers need to have the nickname manually handled by them and set a pointer to that nickname into this struct).
  • Because of the last reason, it is not at all obvious if the instance pointed to by said pointer should be explicitly released by the programmer or it will be released automatically by the struct itself.
  • The “optionalness” here is not explicit at all at code level.

std::optional<T> provides safeties for all these things:

  • Its instances can be created at stack level, so there will not be extra allocation, deallocation or null-references: RAII will take care of them (though this depends on the actual Standard Library implementation).
  • The “optionalness” of the attribute is completely explicit when used: Nothing is more explicit than marking as “optional” to something… optional, isn’t it?
  • Instances of std::optional<T> hide the direct access to the object, so to access its actual value they force the programmer to do extra checks.
  • If we try to get the actual value of an instance that is not storing anything, a known exception is thrown instead of unexpected behavior.

Refactoring my code, it will look like this one:

#include <optional>
#include <string>

struct person
{
  std::string first_name;
  std::string last_name;
  std::optional<std::string> nickname;
};

The code is pretty explicit and no need to further explanation or documentation about the optional character of “nickname”.

So let’s create two people, one with nickname and the other one with no nickname:

int main()
{
  person p1 { "John", Doe", std::nullopt };
  person p2 { "Robert", "Balboa", "Rocky" };
}

In the first instance, I have used “std::nullopt” which represents an std::optional<T> instance with no value (i.e. : an “absence of value”).

In the second case, I am implicitly invoking to the std::optional<T> constructor that receives an actual value.

The verbose alternative would be:

int main()
{
    person p1 { "John", "Doe", std::optional<std::string> { } };
    person p2 { "Robert", "Balboa", std::optional<std::string> {"Rocky"} };
}

The parameterless constructor represents an absence of value (std::nullopt) and the other constructor represents an instance storing an actual value.

Next I will overload the operator<< to work with my struct person, keeping in mind that if the person has a nickname, I want to print it out.

This could be a valid implementation:

std::ostream& operator<<(std::ostream& os, const person& p)
{
    os << p.last_name << ", " << p.first_name;
    
    if (p.nickname.has_value())
    {
        os << " (" << p.nickname.value() << ")";
    }
    
    return os;
}

The has_value() method returns true if the optional<T> instance is storing an actual value. The value can be retrieved using the value() method.

There is an overload for the operator bool that does the same thing that the has_value() method does: Verifying if the instance stores an actual value or not.

Also there are overloads for operator* and operator-> to access the actual values.

So, a less verbose implementation of my operator<< shown above would be:

std::ostream& operator<<(std::ostream& os, const person& p)
{
    os << p.last_name << ", " << p.first_name;
    
    if (p.nickname)
    {
        os << " (" << *(p.nickname) << ")";
    }
    
    return os;
}

Other way to retrieve the stored value, OR return an alternative value would be using the method “value_or()” method.

void print_nickname(const person& p)
{
    std::cout << p.first_name << " " << p.last_name << "'s nickname: "
              << p.nickname.value_or("[no nickname]") << "\n";
}

For this example, if the nickname variable stores an actual value, it will return it, otherwise, the value returned will be, as I coded: “[no nickname]”.

What will happen if I try to access to the optional<T>::value() when no value is actually stored? A std::bad_optional_access exception will be thrown:

try
{
    std::optional<int> op {};
    std::cout << op.value() << "\n";
}
catch (const std::bad_optional_access& e)
{
    std::cerr << e.what() << "\n";
}

Notice I have used the value() method instead of operator*. When I use operator* instead of value(), the exception is not thrown and the user runs into an unexpected behavior.

So, use std::optional<T> in these scenarios:

  • You have some attributes or function arguments that may have no value and are, therefore, optional. std::optional<T> makes that decision explicit at the code level.
  • You have functions that may OR may not return something. For example, what will be the minimum integer found in an empty list of ints? So instead of returning an int with its minimum value (std::numeric_limits<int>::min()), it would be more accurate to return an std::optional<int>.

Note that std::optional<T> does not support reference types (i.e. std::optional<T&>) so if you want to store an optional reference, probably you want to use a std::reference_wrapper<T> instead of type T (i.e. std::optional<std::reference_wrapper<T>>).