C++ “Hello world”

Ok, the most famous first program on any programming language is the “Hello world” program, so I will explain how to create one in this post.

In my example I will use “g++” in a Linux environment, but “clang++” works exactly the same.

To create a “Hello world” in C++, you need to create an empty file and give it a name with an extension (any name, for example HelloWorld.cpp); “cpp”, “cxx” or “cc” are well-known C++ file name extensions.

The compiler does not verify if the filename is equal to the name of the “class” or anything inside the file; the file can be stored on any folder too, you do not need to create it inside a special folder containing all stuff of a given “package” (à la Java), for example.

So, after creating an empty HelloWorld.cpp, you can open it using any text editor and start to write the following lines of code:

#include <iostream>

int main()
{
  std::cout << "Hello world\n";
}

Save it, open a terminal, go to the folder where your file is located and there, enter this:

g++ HelloWorld.cpp -o HelloWorld

If after entering that command line you do not get any message, BE HAPPY! Your program compiled properly. Otherwise, you did some error in your code and you need to fix it and compile it again.

After compiling it properly, you need to execute the program. In a Linux/Unix environment, you do that writing the name of the program after a ./ :

./HelloWorld

And the program should show:

Hello world

Understanding how all of this worked

The compilation process in C++ has basically three steps:

  1. Passing your program through the "preprocessor"; an entity that performs several text transformations on your code before being compiled.
  2. The actual compilation process, that turns all your code into machine code with several calls to functions that are located in other libraries.
  3. The linking process, that binds the function calls with the actual functions in the libraries the program uses. If you do not specify anything (as in our case), your program will be linked only to the Standard Library that ships with any C++ compiler.

The parts of the program

#include

#include <iostream>

All things that start with "#" are called "preprocessor directives", that are instructions the preprocessor understands and executes.

#include tells to the preprocessor to look for the file named inside quotes or less-than and greater-than and put its content in the place the #include directive was invoked.

If the filename is inside less-than and greater-than characters (as in our case), the preprocessor will look for the file in a previously defined folder the compiler knows about. So, it will look for the file iostream in that folder. In a Linux environment, those files are generally in a path similar to this one (I am using g++ 8.2):

/usr/include/c++/8

If the filename is declared between double quotes, it means the file will be in the current folder or in a folder explictly mentioned while compiling the program.

iostream is the file that contains a lot of code that allows our programs to have data input and output. In our “Hello World”, we will need “std::cout” that is defined in this file.

main function
int main()

When you invoke your program, the operating system needs to know what piece of code it needs to execute. Such piece of code lives inside the function main.

All functions must return something, for example, if you call a function sum that sums two numbers, it must return a value containing the result of the sum. So, the function sum must return an integer value (an int). Some old compilers used to allow the function main() to return “void” (that means: “return nothing”) but the C++ standard specifies the main() function must return an int value.

Anyway, though this function is declared returning an int, if you do not return anything, the compiler does not complain about that and returns a 0. Notice that this behavior is exceptional and it is only allowed for the function main().

The return value of function main() specified if an error occurred or not. 0 means that no error occurred during the program executed; and a non-zero value means that an error occurred. The specific value being returned is completely depending on the programmers design and error mechanisms defined by them.

The program will be executed while the function main() is being executed. When its execution ends, the program automatically ends returning the return value to the Operating System.

The body of any function is declared inside two curly braces.

std::cout
std::cout << "Hello world\n";

std::cout is a pre-existing object that represents the command line output. The “<<" is an operator that basically does, in this case, sends the text "Hello world\n" to the std::cout object, producing an output of such text in the terminal.

The \n character sequence means an end of line.

g++

g++ is the most popular C++ compiler for Unix platforms. These days clang has a lot of popularity and you can replace one to other because clang parameters are completely compatible to the g++ ones.

When you say something like:

g++ HelloWorld.cpp

You are instructing to the g++ compiler to go through all the compilation process for the file “HelloWorld.cpp”. “Go through all the compilation process” in this case means: Running the preprocessor on the file, compiling it, linking it and producing an executable.

Since in this command line in my example above I did not mention the name of the executable file, the g++ command generates a file called “a.out” in the current folder.

To specify the name of the file to be generated, you must invoke g++ with the “-o” option and then the name of the executable file to be generated.

C++: “auto” on anonymous functions

C++14 introduced an improvement to the way we can declare anonymous functions (a.k.a. lambda expressions).

For example, before C++14, if you wanted to pass an anonymous function to the sort algorithm, you had to do something like this:

Continue reading “C++: “auto” on anonymous functions”

C++: Primitive types

A primitive type is a data type where the values that it can represent have a very simple nature (a number, a character or a truth-value); the primitive types are the most basic building blocks for any programming language and are the base for more complex data types. These are the primitive types available in C++:

bool

It is stored internally in one byte and the values that a variable of this type can represent are true or false. All boolean operations return a value of this type. This type was not available in early C, so, a lot of operations that return integer values instead of boolean ones can be used as boolean expressions (in that case, the compiler assumes that 0 represents false and any value different than 0 represents true). For example, the following two code excerpts have the same semantics:

int a = 2;
if (a != 0) //a != 0 evaluates to a boolean value. In this case, it evaluates to true
  printf("a is different than 0\n");

and

int a = 2;
if (a) //a is an "int", but since it is different than 0, the compiler evaluates it as true
  printf("a is different than 0\n");

char

It is stored internally as a byte and represents a character. When this data type was created, there was not immediate need of international character support in the language, so, it was completely useful to store all the set of characters needed to write anything on English. Anyway, when the use of computers was evolving, extending and becoming world-wide available, support for international characters was needed and evident and then new character encoding standards were defined. When these new encoding standards were available, a new character data type was needed, because one byte was not enough to support all symbols used in all human languages (Chinese glyphs, for example, are more than 40000). Although of this, char is still used as the standard character data type and a lot of legacy code still uses character strings based on char; some encoding algorithms exist that can store international characters on sequences of char characters, for example, UTF-8 that stores Unicode characters on 1, 2, 3 or 4 char characters.

wchar_t

It is is a wide-character type, represents a character but it is stored internally using 16 or 32 bits instead of the 8 bits of the char type. How many bits a value of this type uses, depends on the computer architecture, the operating system and the C++ compiler being used. Commonly, Windows uses 16-bit characters and the UNIXes use 32-bit characters. The encoding used to represent a wchar_t character is not defined by the standard and the decision of what encoding to use was deliberately left to the compiler. Both types, char and wchar_t can be treated as integer data types and the programmer is able to perform arithmetical operations using values of these types. When created, wchar_t was not a built-in data type but it was just a type alias (typedef); the current compilers use it as a built-in data type by default but the user can tell the compiler to treat it as a typedef (this is needed to support legacy code as well).

short

It is a “short integer” representing an integer that has less precision than a “full-blown int”. Though generally short represents a signed integer with 16-bit precision (that means that it can represent values between -32768 and 32767), the decision of what precision to use was left to the compiler implementor. unsigned short is the unsigned version of this 16-bit precision integer, but the values that it represents are between 0 and 65535.

int

It is the most common integer data type and it was used to represent a processor “word”; so, in 16-bit platforms, it used to be a 16-bit precision integer number and in 32-bit platforms, it is a 32-bit precision number. This “rule” was broken when 64-bit hardware became available and the int data type has still a 32-bit precision; that means that it can store numbers between −2147483648 to 2147483647 or between 0 and 4294967295 if using the unsigned int version instead.

long and unsigned long

They represent “long integer numbers” and their precisions depend on the compiler and the OS. For 16-bit OSes, they used to represent 32-bit precision integer numbers; for 32-bit hardware, they also represent 32-bit precision numbers and for 64-bit OSes, they have a 32-bit precision on Windows and 64-bit precision on UNIXes.

long long and unsigned long long

They represent 64-bit integers.

float

It represents a single-precision floating point number. It is stored in 32-bits (as defined in IEEE 754-2008) and it can represent numbers between 1.18(10^−38) and 3.4(10^38) with around 7 digits of mantissa.

double

It represents a double-precision floating point number It is stored in 64-bits and it can represent numbers between 2.2250738585072009(10^-308) and 1.7976931348623157(10^308) with approximately 16 digits of precision.

C99 exact-width integer types

C99 also introduced a set of exact-width integer types that represent signed and unsigned integer numbers with precision of 8, 16, 32 and 64-bit independently from the compiler, OS or processor architecture. They are:

  • 8-bit precision: int8_t and uint8_t
  • 16-bit precision: int16_t and uint16_t
  • 32-bit precision: int32_t and uint32_t
  • 64-bit precision: int64_t and uint64_t

These exact-width integer types are not built-in types; they are just aliases (typedefs) of the primitive types described above. They are still not supported for all compilers (for example, Microsoft introduced the stdint.h [that is the library header that declares them] just for Visual Studio 2010).