2 unstable releases
Uses new Rust 2024
| new 0.2.0 | Feb 4, 2026 |
|---|---|
| 0.1.0 | Feb 3, 2026 |
#381 in Biology
Used in quad-fm
1MB
4K
SLoC
QuadRank: a High Throughput Rank
This repo implements BiRank, QuadRank, and QuadFm, two fast rank data
structures for binary and DNA input, and a simple count-only FM-index that all use batching and prefetching of queries.
BiRank and QuadRank need only a single cache-miss per query, making them up to 2x faster than other methods in high-throughput settings. QuadFm is up to 4x faster than genedex (https://github.com/feldroop/genedex), which seems to be the fastest Rust-based FM-index currently.
NOTE: Currently only AVX2 is supported!
Usage
BiRank and QuadRank are constructed from (little-endian) packed data, given as
a &[u64] slice.
For binary input, this means the first bit is the lowest-order bit of the first
value. For QuadRank, each u64 contains 32 2-bit values.
The main data structures are BiRank and QuadRank, which implement
binary::RankerT and quad::RankerT:
trait binary::RankerT {
fn new_packed(seq: &[u64]) -> Self;
/// Number of 1 bits before pos `pos`. Requires `0 <= pos <= len`.
unsafe fn rank_unchecked(&self, pos: usize) -> u64;
/// Prefetch the cache lines needed to answer `rank(pos)`.
fn prefetch(&self, pos: usize);
}
trait quad::RankerT {
/// ASCII values `ACTG` get converted to `0123`.
fn new_ascii_dna(seq: &[u8]) -> Self;
fn new_packed(seq: &[u64]) -> Self;
/// Number of times `0 <= c < 4` occurs before pos `pos`. Requires `0 <= pos <= len`.
unsafe fn rank1_unchecked(&self, pos: usize, c: u8) -> u64;
/// Number of times each character occurs before pos `pos`. Requires `0 <= pos <= len`.
unsafe fn rank4_unchecked(&self, pos: usize, c: u8) -> [u64; 4];
/// Prefetch the cache lines needed to answer `rank1(pos, c)`.
fn prefetch1(&self, pos: usize, c: u8);
/// Prefetch the cache lines needed to answer `rank4(pos)`.
fn prefetch4(&self, pos: usize);
}
Example:
let packed = [u64::MAX, u64::MAX];
let rank = quadrank::BiRank::new(packed);
unsafe {
assert_eq!(rank.rank_unchecked(0), 0);
assert_eq!(rank.rank_unchecked(1), 1);
assert_eq!(rank.rank_unchecked(128), 128);
}
let dna = b"ACGCGCGACTTACGCAT";
let n = dna.len(); // 17
let rank = quadrank::QuadRank::new_ascii_dna(dna);
unsafe {
assert_eq!(rank.rank1_unchecked(0, 0), 0);
assert_eq!(rank.rank4_unchecked(0), [0; 4]);
assert_eq!(rank.rank1_unchecked(n, 0), 4);
assert_eq!(rank.rank4_unchecked(n), [4, 6, 3, 4]);
}
Results
BiRank
Space-time trade-off of binary rank data structures. On the x-axis is the space overhead compared to the input, and on the y-ax the ns per query. Top/middle/bottom are 1/6/12 threads, and left/middle/right are latency, throughput in a for loop, and throughput in a loop with additional prefetching.