Deleaker, part 0: Intro

I am testing this nice desktop tool called “Softanics Deleaker” (https://www.deleaker.com/). It was written by Artem Razin and, as you can deduce by its name, it is an application that helps the programmers to find memory leaks on C++, Delphi and .NET applications.

Starting this post, I will post several blog entries about my experiences using it and the features it exposes.

I installed it and installed the Visual Studio extension that ships with the installer. For my tests, I am using Visual Studio 2019 16.4 preview.

In Visual Studio I created a C++ console application and wrote this very simple and correct application:

int main()
{
    std::cout << "Hello World!\n";
    return 0;
}

When I run the local debugger, and since I have installed the Deleaker VS extension, the leaker will load all libraries and symbols of my application and will open a window similar to this one:

I still do not know what all those options mean, but the important thing here is the “No leaks found” message. The filter containing the “266 hidden” items refers to known leaks that Deleaker knows that exist in the Microsoft C Runtime Library.

Now, I will create a very small program too containing a small memory leak:

int main()
{
    for (int i = 0; i < 10; i++)
    {
        char* s = new char{'a'};
        std::cout << *s << "\n";
    }

    return 0;
}

As obviously observed, I am allocating dynamically one byte to contain a character and I am forgetting to delete it. When I debug it, I get this interesting Deleaker window:

Now Deleaker detected my forgotten allocation and says that to me: “ConsoleApplication2.exe!main Line 7”.

As you can see, the “Hit Count” says that the allocation occurred 10 times (because my loop) and it says that 370 bytes leaked on this problem. Though that seems weird because I allocated only 1 byte 10 times, the 370 bytes appear because I compiled my code in Debug Mode and the compiler adds a lot of extra info per allocation. When I changed my compilation to Release Mode, I got the actual 10 bytes in the Size column.

When you click into the information table in the row containing the memory leak information, the Visual Studio editor highlights that line and moves the caret to such position (the new char{‘a’} line , so you realize where you allocated memory that was not released.

And that is it for now.

In next blog entries I will explore how to “Deleak” not so obvious things, how Deleaker behaves with shared pointers, COM objects, shared libraries, templates, virtual destructors and so on :)

Happy 2020!

 

C++ “Hello world”

Ok, the most famous first program on any programming language is the “Hello world” program, so I will explain how to create one in this post.

In my example I will use “g++” in a Linux environment, but “clang++” works exactly the same.

To create a “Hello world” in C++, you need to create an empty file and give it a name with an extension (any name, for example HelloWorld.cpp); “cpp”, “cxx” or “cc” are well-known C++ file name extensions.

The compiler does not verify if the filename is equal to the name of the “class” or anything inside the file; the file can be stored on any folder too, you do not need to create it inside a special folder containing all stuff of a given “package” (à la Java), for example.

So, after creating an empty HelloWorld.cpp, you can open it using any text editor and start to write the following lines of code:

#include <iostream>

int main()
{
  std::cout << "Hello world\n";
}

Save it, open a terminal, go to the folder where your file is located and there, enter this:

g++ HelloWorld.cpp -o HelloWorld

If after entering that command line you do not get any message, BE HAPPY! Your program compiled properly. Otherwise, you did some error in your code and you need to fix it and compile it again.

After compiling it properly, you need to execute the program. In a Linux/Unix environment, you do that writing the name of the program after a ./ :

./HelloWorld

And the program should show:

Hello world

Understanding how all of this worked

The compilation process in C++ has basically three steps:

  1. Passing your program through the "preprocessor"; an entity that performs several text transformations on your code before being compiled.
  2. The actual compilation process, that turns all your code into machine code with several calls to functions that are located in other libraries.
  3. The linking process, that binds the function calls with the actual functions in the libraries the program uses. If you do not specify anything (as in our case), your program will be linked only to the Standard Library that ships with any C++ compiler.

The parts of the program

#include

#include <iostream>

All things that start with "#" are called "preprocessor directives", that are instructions the preprocessor understands and executes.

#include tells to the preprocessor to look for the file named inside quotes or less-than and greater-than and put its content in the place the #include directive was invoked.

If the filename is inside less-than and greater-than characters (as in our case), the preprocessor will look for the file in a previously defined folder the compiler knows about. So, it will look for the file iostream in that folder. In a Linux environment, those files are generally in a path similar to this one (I am using g++ 8.2):

/usr/include/c++/8

If the filename is declared between double quotes, it means the file will be in the current folder or in a folder explictly mentioned while compiling the program.

iostream is the file that contains a lot of code that allows our programs to have data input and output. In our “Hello World”, we will need “std::cout” that is defined in this file.

main function
int main()

When you invoke your program, the operating system needs to know what piece of code it needs to execute. Such piece of code lives inside the function main.

All functions must return something, for example, if you call a function sum that sums two numbers, it must return a value containing the result of the sum. So, the function sum must return an integer value (an int). Some old compilers used to allow the function main() to return “void” (that means: “return nothing”) but the C++ standard specifies the main() function must return an int value.

Anyway, though this function is declared returning an int, if you do not return anything, the compiler does not complain about that and returns a 0. Notice that this behavior is exceptional and it is only allowed for the function main().

The return value of function main() specified if an error occurred or not. 0 means that no error occurred during the program executed; and a non-zero value means that an error occurred. The specific value being returned is completely depending on the programmers design and error mechanisms defined by them.

The program will be executed while the function main() is being executed. When its execution ends, the program automatically ends returning the return value to the Operating System.

The body of any function is declared inside two curly braces.

std::cout
std::cout << "Hello world\n";

std::cout is a pre-existing object that represents the command line output. The “<<" is an operator that basically does, in this case, sends the text "Hello world\n" to the std::cout object, producing an output of such text in the terminal.

The \n character sequence means an end of line.

g++

g++ is the most popular C++ compiler for Unix platforms. These days clang has a lot of popularity and you can replace one to other because clang parameters are completely compatible to the g++ ones.

When you say something like:

g++ HelloWorld.cpp

You are instructing to the g++ compiler to go through all the compilation process for the file “HelloWorld.cpp”. “Go through all the compilation process” in this case means: Running the preprocessor on the file, compiling it, linking it and producing an executable.

Since in this command line in my example above I did not mention the name of the executable file, the g++ command generates a file called “a.out” in the current folder.

To specify the name of the file to be generated, you must invoke g++ with the “-o” option and then the name of the executable file to be generated.

C++: variant

Let’s suppose I have a system that handles students, teachers and crew of a school.

To model that in an object oriented style, I would have a class hierarchy similar to this one:

class person
{
	std::string name;
	
public:
	template <typename String>
	person(String&& name) : name { forward<String>(name) }
	{
	}
	
    virtual ~person() { }
    const string& get_name() const { return name; }
	virtual void do_something() = 0;
};

class student : public person
{
public:
	using person::person;
	
	void do_homework()
	{
		cout << "Need access to Stack Overflow\n";
	}
	
	void do_something() override
	{
		cout << "I am doing something the students do\n";
	}
};

class teacher : public person
{
public:		
	using person::person;

	void teach()
	{
		cout << "This is the unique truth\n";
	}
	
	void do_something() override
	{
		cout << "I am doing something the teachers do\n";
	}
};

class crew : public person
{
public:
	using person::person;
	
	void help_team()
	{
		cout << "I am helping teachers and students\n";
	}
	
	void do_something() override
	{
		cout << "I am doing something crew do\n";
	}
};

And my collection would be defined like this:

map<size_t, person*> people;

where the size_t ID would be the key of the map.

Since I do not want to deal with raw pointers, this would be a better definition:

map<size_t, unique_ptr<person>> people;

Now, I will insert some elements to my collection:

people.insert(make_pair(14, make_unique<student>("Phil Collins")));
people.insert(make_pair(25, make_unique<teacher>("Peter Gabriel")));
people.insert(make_pair(32, make_unique<crew>("Justin Bieber")));

To get the name of person 14, I should do something like:

people.find(14)->second->get_name(); //being 100% sure that person with ID 14 exists

And to do something specific implemented in a derived class, I need to downcast:

static_cast<crew&>(*people.find(32)->second).help_team();

Since C++11, the language has been evolving to a more generic and more template metaprogramming-like paradigm and has been getting away from the classical OOP design where inheritance and polymorphism are amongst the most important tools.

So, how could I implement something similar to the thing shown above without inheritance and polymorphism?

Let me introduce std::variant ! :)

C++17 introduced variant, that is basically a template class where you specify the possible types of the values that the variant instance can store, so, for my example, I could define something like:

using person = std::variant<student, teacher, crew>;

In this line, I am defining an alias person that represents a variant value that can store a student, a teacher or a crew (think on variant to be something like a typesafe union).

So, my map would be defined in this way:

map<size_t, person> people;

And my classes student, teacher, and crew could be defined as follows:

class student
{
	std::string name;
public:
	template <typename String>
	student(String&& name) : name { forward<String>(name) }
	{
	}
	
	const string& get_name() const { return name; }
	
	void do_homework()
	{
		cout << "Need access to Stack Overflow\n";
	}
	
	void do_something()
	{
		cout << "I am doing something the students do\n";
	}
};

class teacher
{
	std::string name;
public:		
	template <typename String>
	teacher(String&& name) : name { forward<String>(name) }
	{
	}
	
	const string& get_name() const { return name; }

	void teach()
	{
		cout << "This is the unique truth\n";
	}
	
	void do_something()
	{
		cout << "I am doing something the teachers do\n";
	}
};

class crew
{
	std::string name;
	
public:
	template <typename String>
	crew(String&& name) : name { forward<String>(name) }
	{
	}
	
	const string& get_name() const { return name; }
	
	void help_team()
	{
		cout << "I am helping teachers and students\n";
	}
	
	void do_something()
	{
		cout << "I am doing something crew do\n";
	}
};

To make my example clean and to demonstrate that I do not need inheritance and polymorphism, notice I am not defining a base class nor I am defining virtual methods at all. Anyway. in real production code the coder could create a base class with no virtual methods and inherit from such class to avoid code duplication.

Notice also I am not using any pointer (raw or smart), so the map will contain actual values, removing one level of indirection and letting the compiler optimize based on that knowledge.

So, let me add some objects to the map:

people.insert(make_pair(14, student { "Phil Collins" }));
people.insert(make_pair(25, teacher { "Peter Gabriel" }));
people.insert(make_pair(32, crew { "Justin Bieber" }));

To get the person with id 14:

auto& the_variant = people.find(14)->second;

To get the “student” inside that variant object, I need to use the function get:

auto& the_student = get(the_variant);
cout << the_student.get_name() <<  "\n";

If I try to get an object that is not of the type stored in the variant, the system will throw a std::bad_variant_access exception, for example if I try to do this with the variant from the example above:

auto& the_student = get<teacher>(the_variant);

To execute a specific method of a given class, I do not need to do any downcasting because I already have the object of the given type, so, instead of:

static_cast<crew&>(*people.find(32)->second).help_team();

I would do:

get<crew>(people.find(32)->second).help_team();

that is by far straight and cleaner.

Now, given I have a method called “do_something” in all my classes, I would want to be able to invoke it no matter the type of the object stored in the variant.

So, I need to do something like this in the polymorphic world:

for (auto& p : people)
{
	p.second->do_something();
}

To do this, there is a function called: std::visit.

What visit does is accessing the variant object and invoke the method passed as argument with the object stored in the variant. So, given my example, I could do something like:

auto& the_variant = people.find(14)->second;
visit([](auto& s)
{
	s.do_something();
}, the_variant);

The magic is in the “auto” part here. When you “visit” a variant, the compiler generates one method for each type specified in the variant declaration, in my case 3 (one for student, one for crew and one for teacher), and executes the specific method depending on the type of the value stored in the variant. So, to execute do_something() for all objects in the variant, I need to do something like:

for (auto& p : people)
{
    visit([](auto& s)
	{
	    s.do_something();
    }, p.second);
}

It is beautiful, isn’t it? Polymorphic-like behavior with no overhead that polymorphism brings.

C++: “auto” return type deduction

Before C++14, when implementing a function template you did not know the return type of your functions, you had to do something like this:

template <typename A, typename B>
auto do_something(const A& a, const B& b) -> decltype(a.do_something(b))
{
  return a.do_something(b);
}

You had to use “decltype” in order to say the compiler: “The return type of this method is the return type of method do_something of object a”. The “auto” keyword used to say the compiler: “The return type of this function is declared at the end”.

Since C++14, you can do something by far simpler:

template <typename A, typename B>
auto do_something(const A& a, const B& b)
{
  return a.do_something(b);
}

In C++14, the compiler deduces the return type of the methods that have “auto” as return type.

Restrictions:

All returned values must be of the same type. My example below does not compile because I am returning an “int” or a “double”.

auto f(int n)
{
	if (n == 1)
		return 1;

	return 2.0;
}

For recursive functions, a return value must be returned before the recursive call in order to let the compiler to know what will be the type of the value to return, as in this example:

auto accumulator(int n)
{
	if (n == 0)
		return 0;

	return n + accumulator(n - 1);
}

C++: Smart pointers, part 5: weak_ptr

This is the last of several posts I wrote related to smart pointers:

  1. Smart pointers
  2. unique_ptr
  3. More on unique_ptr
  4. shared_ptr
  5. weak_ptr

In modern C++ applications (C++11 and later), you can replace almost all your naked pointers to shared_ptr and unique_ptr in order to have automatic resource administration in a deterministic way so you will not need (almost, again) to release the memory manually.

The “almost” means that there is one scenario where the smart pointers, specifically, the shared_ptr instances, will not work: When you have circular references. In this scenario, since every shared_ptr is pointing to the other one, the memory will never be released.

Continue reading “C++: Smart pointers, part 5: weak_ptr”

C++11: Perfect forwarding

Consider this function template invoke that invokes the function/functor/lambda expression passed as argument passing it the two extra arguments given:

#include <iostream>
#include <string>

using namespace std;

void sum(int a, int b)
{
    cout << a + b << endl;
}

void concat(const string& a, const string& b)
{
    cout << a + b << endl;
}

template <typename PROC, typename A, typename B>
void invoke(PROC p, const A& a, const B& b)
{
    p(a, b);
}

int main()
{
    invoke(sum, 10, 20);
    invoke(concat, "Hello ", "world");
    return 0;
}

Nice, it works as expected and the result is:

30
Hello world

Continue reading “C++11: Perfect forwarding”

C++: Smart pointers, part 4: shared_ptr

This is the fourth post of several posts I wrote related to smart pointers:

  1. Smart pointers
  2. unique_ptr
  3. More on unique_ptr
  4. shared_ptr
  5. weak_ptr

As I mentioned in other posts, C++11 brings a new set of smart pointers into C++. The most useful smart pointer is shared_ptr: Its memory management policy consists in counting the number of shared_ptr instances that refer to the same object in the heap.

Continue reading “C++: Smart pointers, part 4: shared_ptr”

C++: Smart pointers, part 3: More on unique_ptr

This is the third post of several posts I wrote related to smart pointers:

  1. Smart pointers
  2. unique_ptr
  3. More on unique_ptr
  4. shared_ptr
  5. weak_ptr

Ok, here I am going to write about two other features that unique_ptr has that I did not mention in my last post.

unique_ptr default behavior consists on take ownership of a pointer created with new and that would normally be released with delete.

Continue reading “C++: Smart pointers, part 3: More on unique_ptr”

C++11: std::future and std::async

C++11 introduces support for asynchronous calls in a very easy way.

An asynchronous call is a method invocation that will be executed in a separate thread (or core or processor); so, the caller of the method does not wait for the result of the execution and continue doing what is next; in this way, the compiler/processor/operating system can optimise the execution of the program and execute several routines at the same time (given the now common multicore systems we all have at home and in our pockets!). The standard library provides the mechanisms to perform those asynchronous calls and store the results until the caller will actually need them.

Continue reading “C++11: std::future and std::async”

C++11: std::thread

The standard library that ships with the new C++11 contains a set of classes to use threads. Before this, we needed to use the OS specific thread facilities each OS provides making our programs hard to port to other platforms.

Anyway, as today (November 16th, 2012), I tried threads using g++ 4.7 in Linux, Windows (through mingw), Mac and NetBSD and I just had success in Linux, Windows and Mac do not implement the thread features and NetBSD misses some details on the implementation (the this_thread::sleep_for() method, for example). Microsoft Visual Studio 2012 ships with good thread support.

To define a thread, we need to use the template class std::thread and we need to pass it a function pointer, a lambda expression or a functor. Look at this example:

Continue reading “C++11: std::thread”

C++11: enable_if

std::enable_if is another feature taken from the Boost C++ library that now ships with every C++11 compliant compiler.

As its name says, the template struct enable_if, enables a function only if a condition (passed as a type trait) is true; otherwise, the function is undefined. In this way, you can declare several “overloads” of a method and enable or disable them depending on some requirements you need. The nice part of this is that the disabled functions will not be part of your binary code because the compiler simply will ignore them.

Continue reading “C++11: enable_if”

C++11: unordered maps

The STL ships with a sorted map template class that is commonly implemented as a balanced binary tree.

The good thing on this is the fast search average time ( O(log2N) if implemented as a balanced binary tree, where N is the number of elements added into the map) and that when the map is iterated, the elements are retrieved following an established order.

C++11 introduces an unordered map implemented as a hash table. The good thing on this is the constant access time for each element ( O(1) ) but the bad things are that the elements are not retrieved in order and the memory print of the whole container can (I am just speculating here) be greater that the map’s one.

Continue reading “C++11: unordered maps”

C++: Smart pointers, part 1

This is the first of several posts I wrote related to smart pointers:

  1. Smart pointers
  2. unique_ptr
  3. More on unique_ptr
  4. shared_ptr
  5. weak_ptr

Memory management in C is too error prone because keeping track of each bunch of bytes allocated and deallocated can be really confusing and stressing.

Although C++ has the same manual memory management than C, it provides us some additional features that let us to do this management easier:

  • When an object is instantiated in the stack (e.g. Object o;); the C++ runtime ensure the destructor of such object is invoked when the object goes out of scope (the end of the enclosing block is reached, a premature ‘return’ is found or an exception is thrown); thus, releasing all memory and resources allocated for such object.
  • (Ab)using the feature of operator overloading, we can create classes that simulate the behavior of the pointers. Such classes are called: Smart pointers.

Continue reading “C++: Smart pointers, part 1”

C++: Primitive types

A primitive type is a data type where the values that it can represent have a very simple nature (a number, a character or a truth-value); the primitive types are the most basic building blocks for any programming language and are the base for more complex data types. These are the primitive types available in C++:

bool

It is stored internally in one byte and the values that a variable of this type can represent are true or false. All boolean operations return a value of this type. This type was not available in early C, so, a lot of operations that return integer values instead of boolean ones can be used as boolean expressions (in that case, the compiler assumes that 0 represents false and any value different than 0 represents true). For example, the following two code excerpts have the same semantics:

int a = 2;
if (a != 0) //a != 0 evaluates to a boolean value. In this case, it evaluates to true
  printf("a is different than 0\n");

and

int a = 2;
if (a) //a is an "int", but since it is different than 0, the compiler evaluates it as true
  printf("a is different than 0\n");

char

It is stored internally as a byte and represents a character. When this data type was created, there was not immediate need of international character support in the language, so, it was completely useful to store all the set of characters needed to write anything on English. Anyway, when the use of computers was evolving, extending and becoming world-wide available, support for international characters was needed and evident and then new character encoding standards were defined. When these new encoding standards were available, a new character data type was needed, because one byte was not enough to support all symbols used in all human languages (Chinese glyphs, for example, are more than 40000). Although of this, char is still used as the standard character data type and a lot of legacy code still uses character strings based on char; some encoding algorithms exist that can store international characters on sequences of char characters, for example, UTF-8 that stores Unicode characters on 1, 2, 3 or 4 char characters.

wchar_t

It is is a wide-character type, represents a character but it is stored internally using 16 or 32 bits instead of the 8 bits of the char type. How many bits a value of this type uses, depends on the computer architecture, the operating system and the C++ compiler being used. Commonly, Windows uses 16-bit characters and the UNIXes use 32-bit characters. The encoding used to represent a wchar_t character is not defined by the standard and the decision of what encoding to use was deliberately left to the compiler. Both types, char and wchar_t can be treated as integer data types and the programmer is able to perform arithmetical operations using values of these types. When created, wchar_t was not a built-in data type but it was just a type alias (typedef); the current compilers use it as a built-in data type by default but the user can tell the compiler to treat it as a typedef (this is needed to support legacy code as well).

short

It is a “short integer” representing an integer that has less precision than a “full-blown int”. Though generally short represents a signed integer with 16-bit precision (that means that it can represent values between -32768 and 32767), the decision of what precision to use was left to the compiler implementor. unsigned short is the unsigned version of this 16-bit precision integer, but the values that it represents are between 0 and 65535.

int

It is the most common integer data type and it was used to represent a processor “word”; so, in 16-bit platforms, it used to be a 16-bit precision integer number and in 32-bit platforms, it is a 32-bit precision number. This “rule” was broken when 64-bit hardware became available and the int data type has still a 32-bit precision; that means that it can store numbers between −2147483648 to 2147483647 or between 0 and 4294967295 if using the unsigned int version instead.

long and unsigned long

They represent “long integer numbers” and their precisions depend on the compiler and the OS. For 16-bit OSes, they used to represent 32-bit precision integer numbers; for 32-bit hardware, they also represent 32-bit precision numbers and for 64-bit OSes, they have a 32-bit precision on Windows and 64-bit precision on UNIXes.

long long and unsigned long long

They represent 64-bit integers.

float

It represents a single-precision floating point number. It is stored in 32-bits (as defined in IEEE 754-2008) and it can represent numbers between 1.18(10^−38) and 3.4(10^38) with around 7 digits of mantissa.

double

It represents a double-precision floating point number It is stored in 64-bits and it can represent numbers between 2.2250738585072009(10^-308) and 1.7976931348623157(10^308) with approximately 16 digits of precision.

C99 exact-width integer types

C99 also introduced a set of exact-width integer types that represent signed and unsigned integer numbers with precision of 8, 16, 32 and 64-bit independently from the compiler, OS or processor architecture. They are:

  • 8-bit precision: int8_t and uint8_t
  • 16-bit precision: int16_t and uint16_t
  • 32-bit precision: int32_t and uint32_t
  • 64-bit precision: int64_t and uint64_t

These exact-width integer types are not built-in types; they are just aliases (typedefs) of the primitive types described above. They are still not supported for all compilers (for example, Microsoft introduced the stdint.h [that is the library header that declares them] just for Visual Studio 2010).