Make Your Code Slower With Multithreading

With the performance of modern CPU cores plateauing recently, the main performance gains are with multiple cores and multithreaded applications. Typically, a fast GPU is only so mind-bogglingly quick because thousands of cores operate in parallel on the same set of tasks. So, it would seem prudent for our applications to try to code in a multithreaded fashion to take advantage of this parallelism. Or so it would seem, but as [Marc Brooker] illustrates, it’s not as simple as one would assume, and it’s very easy to end up with far worse overall performance and no easy way to fix it.

[Marc] was rerunning an old experiment to calculate the expected number of birthdays in a shared group of people using brute force. The experiment was essentially a tight loop running a pseudorandom number generator, the standard libc rand() function. [Marc] profiled the code for single-thread and multithreaded versions and noted the runtime dramatically increased beyond two threads. Something fishy was going on. Running perf, [Marc] noted that there were significant L1 cache misses, but the real killer for performance was the increase in expensive context switches.  Perf indicated that for four threads, the was an overhead of nearly 50% servicing spin locks. There were no locks in the code, so after more perf magic, the syscalls taking all the time were identified.  Something in there was using a futex (or fast userspace mutex) a whole lot.

Continue reading “Make Your Code Slower With Multithreading”

Processes, Threads, And… Fibers?

You’ve probably heard of multithreaded programs where a single process can have multiple threads of execution. But here is yet another layer of creating multitasking programs known as a fiber. [A Graphics Guy] lays it out in a lengthy but well-done post. There are examples for both x64 and arm64, although the post mainly focuses on x64 for Windows. However, the ideas will apply anywhere.

In the old days, there was a CPU and when your program ran on it, it was in control. But that’s wasteful, so software quickly moved to where many programs could share the CPU simultaneously. Then, as that got overloaded, computers got more CPUs. Most operating systems have the idea of a process, which is a program that thinks it is in complete control, but it is really sharing the CPU with other processes. The problem arises when you want to have multiple “little” programs that cooperate. Processes are not really supposed to know about one another and, if they do, there’s usually some heavy-weight communication mechanism allowing them to talk.

Continue reading “Processes, Threads, And… Fibers?”

Minecraft Finally Gets Multi-Threaded Servers

Minecraft servers are famously single-threaded and those who host servers for large player bases often pay handsomely for a server that has gobs of memory and ripping fast single-core performance. Previous attempts to break Minecraft into separate threads haven’t ended successfully, but it seems like the folks over at [PaperMC] have finally cracked it with Folia.

Minecraft is one of (if not the most) hacked and modded games in history. Mods have been around since the early days, made possible by a dedicated group who painstakingly decompiled the Java bytecode and reverse-engineered it. Bukkit was a server mod back in the Alpha days that tried to support plugins and extend the default Minecraft. From Bukkit, Spitgot was forked. From Spitgot, Paper was forked, which focused on performance and gameplay mechanics. And now from Paper, Folia is a new fork focused on multi-threading.