Taming C++, episode 1: motivation

C++ is pretty much the standard language in the software industry for any project where performance matters. I have used it a fair bit in the past - during my IOI time and for a university course - but I used it only as “C with STL”. I have also reviewed it for my job interview a couple of years ago, but I have not touched it since then. Given that I am at the early stages of my career as a software developer and that I am interested in performance, it’s about time I learn modern C++!

Motivation

As much as I like programming and learning new stuff, this is something that requires some effort. And it turns out that keeping a copy of A Tour of C++ on my desk and looking at its cover from time to time does not provide sufficient motivation to take up this new task.

Starting a new project would be a good way to start playing with a new language, but at the moment I would rather keep working on my ongoing projects than start a new one. Writing a (series of) blog post(s) where I share my experience with the rest of the world seemed like a good alternative, but I needed some extra push to get started.

Finally, a few weeks ago I have discovered Bert Hubert’s series of posts Modern C++ for C programmers. The first post in that series showed an example that got my attention: for a task as simple as sorting a list of integers, using their respective standard libraries, C++ outperforms C by a significant margin!

Performance

To get started with modern C++, I decided to repeat Bert’s experiment. You can find the code I wrote in this repository.

To sort a list of a hundred million integers, in C we can use qsort() from stdlib.h (see sort.c):

qsort(a, ARRAYSIZE, sizeof(int), compar);

Where compar() is a comparison function such as

int compar(const void *x, const void *y) {
    return *(int *)x - *(int *)y;
}

In C++ we can use sort() from algorithm (see sort.cpp):

std::sort(a, a+ARRAYSIZE,
          [](const int &x, const int &y) { return x < y; });

Here we use a lambda expression instead of a comparison function.

Modern C++ also offers parallelized version of common algorithms as part of the standard library, so if we want to completely humiliate poor C we can use this (see sort_parallel.cpp):

std::sort(std::execution::par, a, a+ARRAYSIZE,
          [](const int &x, const int &y) { return x < y; });

Parallelizing stuff like this looks almost like cheating, but it wasn’t without some struggle - see the “Gotchas” section below.

So, what is the result? These are the times I get on my Debian 12 desktop (AMD Ryzen 7 7700):

C time for 100000000 numbers: 9.815525s
C++ time for 100000000 numbers: 4.87299s
C++ time for 100000000 numbers (parallel): 0.488956s

Even without parallelization, C++ is twice as fast as C!

You might think that C++ is somehow optimizing for integer sorting. But this is not the case, as demonstrated by a similar experiment with pairs of integers (see sort.c, sort.cpp and sort_parallel.cpp - I used a non-standard mixed lexicographic order to be reasonably sure the compiler does not come up with any ad-hoc optimization):

C time for 100000000 pairs: 12.383896s
C++ time for 100000000 pairs: 6.50033s
C++ time for 100000000 pairs (parallel): 0.713616s

Complexity is not for nothing

As explained by Bert in his post, the reason the C++ version is faster is that by using templates the compiler can inline the call to the comparison function. C is not as sophisticated, so this is not possible: the only way a standard library function can call your custom code is via a function pointer. Regardless of possible performance issues with pointer dereferencing, calling a function without inlining it always causes some overhead. This is completely negligible for larger tasks, but for a small function that is called millions of times it makes a big difference!

And all of this without even considering the elephant in the room, that is the fact that with virtually zero extra effort - but again, see the “Gotchas” section below - one can parallelize C++ STL algorithms and make them an order of magnitude faster on modern hardware! In C I would have needed to write a parallel sort on my own with something like pthreads.

Lesson learned for a C developer: sometimes the extra complexity of other languages does bring some benefit.

Gotchas

I mentioned above that I had some trouble compiling and running the parallel version of my code. In fact it took me almost two hours to make it work! This was for a couple of reasons.

The first reason is that, at least on Linux with GCC + GLIBC, the C++ standard library requires TBB. I figured this out rather early, and it did not take me long to learn that I had to add an -ltbb option to my command line either. What took me an incredibly long time to understand is that -ltbb had to be put at the end of the command line options! To make it clear, something like this:

$ g++ -O3 -std=c++20 -ltbb -o sort sort_parallel.cpp

does not work, you have to write

$ g++ -O3 -std=c++20 -o sort sort_parallel.cpp -ltbb

Another problem I had was caused by my use of macros. You can see in the source files that I am using an ARRAYSIZE macro for the size of the array, and I provide a value for it at compile time. Originally I had called this macro N, but this clashed with some internal name used in TBB, and I got all sorts of walls of text of weird template errors - something C++ is infamous for.

The last issue I had was that I was stupid and tried to compile my C++ code with a C compiler. Indeed, while debugging the issue with -ltbb I tried switching to clang, because I sometimes get better error messages with it. But instead of using clang++ I used clang, which only compiles C.

Until next time?

Now that I have broken the ice with C++ I think it will be easier to continue studying it. I may or may not keep writing about this here: turning this into another blog series may be a good way to force myself to do this regularly, but on the other hand it may not be interesting for my readers. We’ll see!