#thread-pool #allocator #numa

sys no-std fork_union

Low-latency OpenMP-style NUMA-aware cross-platform fine-grained parallelism library

28 releases (stable)

2.3.0 Oct 9, 2025
2.2.9 Sep 16, 2025
2.2.0 Jul 17, 2025
1.0.6 Jun 15, 2025
0.3.3 May 22, 2025

#73 in Concurrency

Download history 26/week @ 2025-10-15 13/week @ 2025-10-22 8/week @ 2025-11-12 20/week @ 2025-11-26 33/week @ 2025-12-17 47/week @ 2025-12-24 101/week @ 2025-12-31 45/week @ 2026-01-07 87/week @ 2026-01-14 423/week @ 2026-01-21 295/week @ 2026-01-28

873 downloads per month
Used in 4 crates

Apache-2.0

195KB
3K SLoC

Rust 2.5K SLoC // 0.0% comments C++ 507 SLoC // 0.1% comments

Fork Union 🍴

Fork Union is arguably the lowest-latency OpenMP-style NUMA-aware minimalistic scoped thread-pool designed for 'Fork-Join' parallelism in C++, C, and Rust, avoiding × mutexes & system calls, × dynamic memory allocations, × CAS-primitives, and × false-sharing of CPU cache-lines on the hot path 🍴

Motivation

Most "thread-pools" are not, in fact, thread-pools, but rather "task-queues" that are designed to synchronize a concurrent dynamically growing list of heap-allocated globally accessible shared objects. In C++ terms, think of it as a std::queue<std::function<void()>> protected by a std::mutex, where each thread waits for the next task to be available and then executes it on some random core chosen by the OS scheduler. All of that is slow... and true across C++, C, and Rust projects. Short of OpenMP, practically every other solution has high dispatch latency and noticeable memory overhead. OpenMP, however, is not ideal for fine-grained parallelism and is less portable than the C++ and Rust standard libraries.