C++: Move semantics

This example involves a class A and a container called List<T>. As shown, the container is essentially a wrapper around std::vector.

There is also a function called getNObjects that returns a list containing N instances of class A.

#include <vector>
#include <string>
#include <iostream>
 
class A
{
public:
    A() = default;
    ~A() = default;
    A(const A& a) { std::cout << "copy ctor" << std::endl ; }
     
    A& operator=(const A&)
    {
        std::cout << "operator=" << std::endl;
        return *this;
    }
};
 
template <typename T>
class List
{
private:
    std::vector<T>* _vec;
     
public:
    List() : _vec(new vector<T>()) { }
    ~List() { delete _vec; }
     
    List(const List<T>& list)
        : _vec(new vector<T>(*(list._vec)))
    {
    }
     
    List<T>& operator=(const List<T>& list)
    {
        delete _vec;
        _vec = new vector<T>(*(list._vec));
        return *this;
    }
     
    void add(const T& a)
    {
        _vec->push_back(a);
    }
     
    int getCount() const
    {
        return static_cast<int>(_vec->size());
    }
     
    const T& operator[](int i) const
    {
        return (*_vec)[i];
    }
};
  
List<A> getNObjects(int n)
{
    List<A> list;
    A a;
    for (int i = 0; i< n; i++)
        list.add(a);
 
    std::cout << "Before returning: ********" << std::endl;
    return list;
}
 
int main()
{
    List<A> list1;
    list1 = getNObjects(10);    
    return 0;
}

When this code runs, it will produce an output like this:

...
...
copy ctor
copy ctor
Before returning: ********
copy ctor
copy ctor
copy ctor
copy ctor
copy ctor
copy ctor
copy ctor
copy ctor
copy ctor
copy ctor

The number of calls to the copy constructor equals the number of objects in the list!

Why?

Because when the getNObjects() function returns a list, all its attributes are copied (i.e., the internal vector is copied) to the list that receives the result (list1), and then the local list inside the function is destroyed (triggering the destructor for each object in the list). Though this is logically correct, it results in poor performance due to many unnecessary copies and destructions.

Starting from C++11, a new type of reference is available to address this problem: rvalue references. An rvalue reference binds to a temporary object (rvalue), which is typically the result of an expression that is not bound to a variable. Rvalue references are denoted using the symbol &&.

With rvalue references, programmers can create move constructors and move assignment operators, which improve performance when returning or copying objects in cases like this example.

How does this work?

In the example, the List<T> class contains a pointer to a vector. What happens if, instead of copying every object in the std::vector, the programmers “move” the vector pointer from the local list inside the function to the list that receives the result? This would save a lot of processing time by avoiding unnecessary copies and destructions.

Thus, the move constructor and move assignment operator work as follows: They receive an rvalue reference to the list being moved, “steal” its data, and take ownership of it. Taking ownership means that the object receiving the data is responsible for releasing all resources originally managed by the moved-from object (achieved by setting the original object’s pointer to nullptr).

Here’s how imove constructor and move assignment operator can be implemented for the List<T> class:

List(List<T>&& list) //move constructor
    : _vec(list._vec)
{
    list._vec = nullptr; //releasing ownership
}
     
List<T>& operator=(List<T>&& list)
{
    delete _vec;
    _vec = list._vec;
    list._vec = nullptr; //releasing ownership
    return *this;
}

With these changes, the output would be:

...
...
copy ctor
copy ctor
Before returning: ********

All the copy constructor calls after the “Before returning” line are avoided.

Isn’t that great?

What other uses do rvalue references have?

Here’s an example of a simple swap function for integers:

void my_swap(int& a, int& b)
{
  int c = a;
  a = b;
  b = c;
}

Straightforward enough. But what if, instead of swapping two integers, we needed to swap two large objects (such as vectors, linked lists, or other complex types)?

template <typename T>
void my_swap(T& a, T& b)
{
  T c = a;
  a = b;
  b = c;
}

If the copy constructor of class T is slow (like the std::vector copy constructor), this version of my_swap can have very poor performance.

Here’s an example demonstrating the issue:

#include <iostream>
#include <string>
  
class B
{
    private: int _x;
    public:
        B(int x) : _x(x) { cout << "ctor" << endl; }
         
        B(const B& b) : _x(b._x)
        {
            std::cout << "copy ctor" << std::endl;
        }
         
        B& operator=(const B& b)
        {
            _x = b._x;
            std::cout << "operator=" << std::endl;
            return *this;
        }
 
        friend std::ostream& operator<<(std::ostream& os, const B& b)
        {
            os << b._x;
            return os;
        }
         
};
 
template <typename T>
void my_swap(T& a, T& b)
{
    T c = a; //copy ctor, possibly slow
    a = b;   //operator=, possibly slow
    b = c;   //operator=, possibly slow
}
 
int main()
{
    B a(1);
    B b(2);
    my_swap(a, b);
    std::cout << a << "; " << b << std::endl;
    return 0;
}

The output is:

ctor
ctor
copy ctor
operator=
operator=
2; 1

The class B is simple, but if the copy constructor and assignment operator are slow, my_swap‘s performance will suffer.

To add move semantics to class B, move constructor and move assignment operator must be implemented:

B(B&& b)  : _x(b._x)
{
    std::cout << "move ctor" << std::endl;
}
         
B& operator=(B&& b)
{
    _x = b._x;
    std::cout << "move operator=" << std::endl;
    return *this;
}

However, the move constructor and assignment operator will not be invoked automatically. In my_swap, the compiler does not know if it should use the copy or move versions of the constructors and assignment operators.

This problem can be fixed by explicitly telling the compiler to use move semantics using the function template std::move():

template <typename T>
void my_swap(T& a, T& b)
{
    T c = std::move(a); //move ctor, fast
    a = std::move(b);   //move operator=, fast
    b = std::move(c);   //move operator=, fast
}

The std::move function casts an lvalue to an rvalue reference, signaling to the compiler that it should use the move constructor and assignment operator.

The updated output is:

ctor
ctor
move ctor
move operator=
move operator=
2; 1

The entire standard library has been updated to support move semantics.

Perfect forwarding is another feature built on top of rvalue references.

C++: Lambda expressions

Having this Person class:

class Person
{
  private:
    std::string firstName;
    std::string lastName;
    int id;

  public:
    Person(const std::string& fn, const std::string& ln, int i)
    : firstName{fn}
    , lastName{ln}
    , id{i}
    {
    }

    const std::string& getFirstName() const { return firstName; }
    const std::string& getLastName() const { return lastName; }
    int getID() const { return id; }
};

The programmers need to store several instances of this class in a vector:

std::vector<Person> people;
people.push_back(Person{"Davor", "Loayza", 62341});
people.push_back(Person{"Eva", "Lopez", 12345});
people.push_back(Person{"Julio", "Sanchez", 54321});
people.push_back(Person{"Adan", "Ramones", 70000});

If they want to sort this vector by person ID, a PersonComparator must be implemented to be used in the std::sort algorithm from the standard library:

class PersonComparator
{
  public:
     bool operator()(const Person& p1, const Person& p2) const
     {
        return p1.getID() < p2.getID();
     }
};

PersonComparator pc;
std::sort(people.begin(), people.end(), pc);

Before C++11, the programmers needed to create a separate class (or alternatively a function) to use the sort algorithm (actually to use any standard library algorithm).

C++11 introduced “lambda expressions”, which are a nice way to implement that functionality to be passed to the algorithm exactly when it is going to be used. So, instead of defining the PersonComparator as shown above, the same functionality could be achieved by implementing it in this way:

std::sort(people.begin(), people.end(), [](const Person& p1, const Person& p2)
{
  return p1.getID() < p2.getID();
});

Quite simple and easier to read. The “[]” square brackets are used to mark the external variables that will be used in the lambda context. “[]” means: “I do not want my lambda function to capture anything”; “[=]” means: “everything passed by value” (thanks Jeff for your clarification on this!!); “[&]” means: “everything passed by reference”.

Given the vector declared above, what if the programmers want to show all instances inside it? They could do this before C++11:

std::ostream& operator<<(std::ostream& os, const Person& p)
{
    os << "(" << p.getID() << ") " << p.getLastName() << "; " << p.getFirstName();
    return os;
}

class show_all
{
public:
    void operator()(const Person& p) const
    { 
        std::cout << p << std::endl;
    }
};

show_all sa;
std::for_each(people.begin(), people.end(), sa);

And with lambdas the example could be implemented in this way:

std::for_each(people.begin(), people.end(), [](const Person& p)
{
    std::cout << p << std::endl;
});

C++: ‘auto’

Starting C++11, C++ contains a lot of improvements to the core language as well as a lot of additions to the standard library.

The aims for this new version, according to Bjarne Stroustrup were making C++ a better language for systems programming and library building, and making it easier to learn and teach.

auto is an already existing keyword inherited from C that was used to mark a variable as automatic (a variable that is automatically allocated and deallocated when it runs out of scope). All variables in C and C++ were auto by default, so this keyword was rarely used explicitly. The decision of recycling it and change its semantics in C++ was a very pragmatic decision to avoid incorporating (and thus, breaking old code) new keywords.

Since C++11, auto is used to infer the type of the variable that is being declared and initialized. For example:

int a = 8;
double pi = 3.141592654;
float x = 2.14f;

in C++11 and later, the code above can be declared in this way:

auto a = 8;
auto pi = 3.141592654;
auto x = 2.14f;

In the last example, the compiler will infer that a is an integer, pi is a double and x is a float.

Someone could say: “come on, I do not see any advantage on this because it is clearly obvious that ‘x‘ is a float and it is easy to me infer the type instead of letting the compiler to do that”, and yes, though the user will always be able to infer the type, doing it is not always as evident or easy as thought. For example, if someone wants to iterate in a std::vector in a template function, int the code below:

template <typename T>
void show(const vector<T>& vec)
{
    for (auto i = vec.begin(); i != vec.end(); ++i) // notice the 'auto' here
        std::cout << *i << std::endl;
}

the following declaration:

auto i = vec.begin();

in ‘old’ C++ would have to be written as:

typename vector::const_iterator i = vec.begin();

So, in this case, the auto keyword is very useful and makes the code even easier to read and more intuitive.

Anyway, there are restrictions on its usage:

  • Before C++20, it is not possible to use auto as a type of an argument of a function or method.
auto x = 2, b = true, c = "hello"; // invalid auto usage

Starting with C++14, the return type of functions can be deduced automatically (you can read more about that here). It can also be used in lambda expressions (more info here); . Starting with C++20, the arguments of a function can also be auto.

To read about auto see these links:

C++: Pimpl

Imagine we have this class defined in the reader.dll DLL:

class DLLEXPORT Reader
{
  public:
    Reader(const std::string& filename);
    ~Reader();        

    std::string readLine() const;
    bool        isEndOfFile()    const;        

  private:
    FILE* file;
};

This class allows its user to read from a file only once.

What if the user wants to use this same class to read from a file multiple times? It can be modified as follows:

class DLLEXPORT Reader
{
  public:
    Reader(const std::string& filename);
    ~Reader();

    std::string readLine() const;
    bool isEndOfFile() const;
    void restart();

  private:
    std::string filename;
    FILE* file;
};

What has been done here is adding the file name as a class attribute, allowing the file to be opened multiple times. Additionally, a ‘restart’ method was introduced.

Below is a function that uses the first version of the reader.dll DLL.

void showFile(const std::string& file)
{
  Reader reader(file);
  while (!reader.isEndOfFile())
  {
    std::cout << reader.readLine() << std::endl;
  }
}

The problem arises when users attempt to link their code with the second version of the reader.dll. The program may malfunction, crash, or fail entirely. Why?

Although the API of the second version is compatible with the first (meaning the code will link perfectly), the ABIs are not. The ABI, or ‘Application Binary Interface’, defines how binaries are linked. Why are the ABIs incompatible? Because the ‘filename’ attribute was added in place of the ‘file’ attribute, every reference to ‘file’ in the invoker will now ‘binarily’ point to the same address where ‘filename’ is located after the change. Since these are different types, the program will behave unpredictably.

This issue occurs because the class header explicitly declares class attributes, which is a well-known encapsulation problem in C++. A similar problem can occur even without adding or removing methods if, for instance, private attributes are replaced (e.g., changing FILE* to std::fstream).

The ‘pimpl idiom’ (also known as the ‘opaque pointer’ or ‘cheshire cat’ idiom) is a C++ technique to avoid this problem. The idea is to include a pointer to a struct in the class interface (.h) to store the class attributes, but define the struct inside the .cpp file, keeping it hidden from the interface. Doing this resolves several issues:

  • ABI compatibility is maintained because the class attributes are not exposed in the .h file and are used only internally within the DLL.
  • It provides better encapsulation (the .h files only expose what the user needs to know).
  • The sizeof(reader) (in this example) remains the same, regardless of how many attributes the class has, as they are hidden within the Pimpl. This is crucial because it prevents memory layout shifts when the implementation changes.
  • If only the implementation changes, the project using our .h does not need to be recompiled since the .h remains unchanged.

So, how would the example look?

VERSION 1: Interface: “Reader.h”

class ReaderImpl; // forward declaration

class DLLEXPORT Reader
{
  public:
    Reader(const std::string& filename);
    ~Reader();

    std::string readLine() const;
    bool isEndOfFile() const;

  private:
    ReaderImpl* pImpl; // pointer to the class attrs
};

Implementation: “Reader.cpp”

#include "Reader.h"

//Here we define the struct to use
struct ReaderImpl
{
  FILE* file;
};

Reader::Reader(const std::string& n)
{
  pImpl = new ReaderImpl{};
  pImpl->file = fopen(n.c_str(), "r");
}

Reader::~Reader()
{
  fclose(pImpl->file);
  delete pImpl;
}

std::string Reader::readLine() const
{
  char aux[256];
  fgets(aux, 256, pImpl->file);
  return {aux};
}

bool Reader::isEndOfFile() const
{
  return feof(pImpl->file);
}

VERSION 2: Interface: “Reader.h”

Implementation: “Reader.cpp”

#include "Reader.h"

struct ReaderImpl
{
  std::string filename; //new attribute for version 2
  FILE* file;
};

Reader::Reader(const std::string&amp; n)
{
  pImpl = new ReaderImpl{};
  pImpl->filename = n;
  pImpl->file = fopen(n.c_str(), "r");
}

Reader::~Reader()
{
  fclose(pImpl->file);
  delete pImpl;
}

std::string Reader::readLine() const
{
  char aux[256];
  fgets(aux, 256, pImpl->file);
  return {aux};
}

bool Reader::isEndOfFile() const
{
  return feof(pImpl->file);
}

void Reader::restart()
{
  fclose(pImpl->file);
  pImpl->file = fopen(pImpl->filename.c_str(), "r");
}

If the programmers of the reader.dll had used the ‘pimpl idiom’ from the beginning, the new Reader.dll would not have affected its consumers at all. This is because the new version would have maintained both API and ABI backwards compatibility.

C++: First message

#include <iostream>
#include <string>
 
int main()
{
  std::string greeting = "Ifmmp!xpsme\"";
   
  for (std::string::const_iterator i = greeting.begin(); i != greeting.end(); ++i)
       std::cout << static_cast<char>((*i) - 1);
   
  std::cout << std::endl;
  return 0;
}

 Hello,

In this blog, I want to publish interesting information about C++ and related topics (object-oriented programming, generic programming, etc.). The goal is to provide useful and accurate information. Since I am passionate about these topics but far from being an expert, I would appreciate your help in improving the accuracy of each post I write. Your comments and suggestions will be greatly appreciated and will help me enhance the content of this blog day by day.

Welcome, and thank you for reading!