Probabilistic data types

Learn how to use Bloom filter set membership and HyperLogLog approximate cardinality with redis-rs.

Redis supports several probabilistic data types that let you calculate values approximately rather than exactly. The redis-rs high-level command traits include support for Bloom filter set membership and HyperLogLog cardinality estimation.

Note:
This page covers Bloom filters and HyperLogLog because redis-rs provides dedicated high-level methods for them. Other probabilistic data types, such as Cuckoo filters, Count-min sketch, t-digest, and Top-K, can still be called with low-level Redis commands, but they don't currently have dedicated high-level redis-rs methods.

Set membership

A Bloom filter lets you track whether or not a particular item has been added to a set. Instead of storing the items themselves, like a set, a Bloom filter records the presence or absence of the hash value of each item. This gives a very compact representation of the set's membership with a fixed memory size, regardless of how many items you add. Note that there is an asymmetry between presence and absence of items in the set: if an item is reported as absent, then it is definitely absent, but if it is reported as present, then there is a small chance it may really be absent.

The redis-rs command traits provide high-level bf_add, bf_madd, bf_exists, and bf_mexists methods, among others. The following example adds some names to a Bloom filter representing a list of users and then checks for the presence or absence of users in the list.

Set membership: Use Bloom filter to efficiently track item presence with minimal memory overhead
        let res1: Vec<bool> = r
            .bf_madd(
                "recorded_users",
                &["andy", "cameron", "david", "michelle"],
            )
            .expect("Failed to add users to Bloom filter");
        println!("{res1:?}"); // >>> [true, true, true, true]

        let res2: bool = r
            .bf_exists("recorded_users", "cameron")
            .expect("Failed to check Bloom filter");
        println!("{res2}"); // >>> true

        let res3: bool = r
            .bf_exists("recorded_users", "kaitlyn")
            .expect("Failed to check Bloom filter");
        println!("{res3}"); // >>> false
        let res1: Vec<bool> = r
            .bf_madd(
                "recorded_users",
                &["andy", "cameron", "david", "michelle"],
            )
            .await
            .expect("Failed to add users to Bloom filter");
        println!("{res1:?}"); // >>> [true, true, true, true]

        let res2: bool = r
            .bf_exists("recorded_users", "cameron")
            .await
            .expect("Failed to check Bloom filter");
        println!("{res2}"); // >>> true

        let res3: bool = r
            .bf_exists("recorded_users", "kaitlyn")
            .await
            .expect("Failed to check Bloom filter");
        println!("{res3}"); // >>> false

Set cardinality

A HyperLogLog object calculates the approximate cardinality of a set. As you add items, the HyperLogLog tracks the number of distinct set members, but it doesn't let you retrieve those members or test whether a specific item was added.

You can also merge two or more HyperLogLogs to find the approximate cardinality of the union of the sets they represent.

Set cardinality: Estimate distinct item count using HyperLogLog with minimal memory usage
        let group1_added: bool = r
            .pfadd("group:1", &["andy", "cameron", "david"])
            .expect("Failed to add items to group:1");
        println!("{group1_added}"); // >>> true

        let group1: usize = r.pfcount("group:1").expect("Failed to count group:1");
        println!("{group1}"); // >>> 3

        let group2_added: bool = r
            .pfadd("group:2", &["kaitlyn", "michelle", "paolo", "rachel"])
            .expect("Failed to add items to group:2");
        println!("{group2_added}"); // >>> true

        let group2: usize = r.pfcount("group:2").expect("Failed to count group:2");
        println!("{group2}"); // >>> 4

        let _: () = r
            .pfmerge("both_groups", &["group:1", "group:2"])
            .expect("Failed to merge HyperLogLogs");
        println!("OK"); // >>> OK

        let both_groups: usize = r
            .pfcount("both_groups")
            .expect("Failed to count both_groups");
        println!("{both_groups}"); // >>> 7

        let group1_added: bool = r
            .pfadd("group:1", &["andy", "cameron", "david"])
            .await
            .expect("Failed to add items to group:1");
        println!("{group1_added}"); // >>> true

        let group1: usize = r.pfcount("group:1").await.expect("Failed to count group:1");
        println!("{group1}"); // >>> 3

        let group2_added: bool = r
            .pfadd("group:2", &["kaitlyn", "michelle", "paolo", "rachel"])
            .await
            .expect("Failed to add items to group:2");
        println!("{group2_added}"); // >>> true

        let group2: usize = r.pfcount("group:2").await.expect("Failed to count group:2");
        println!("{group2}"); // >>> 4

        let _: () = r
            .pfmerge("both_groups", &["group:1", "group:2"])
            .await
            .expect("Failed to merge HyperLogLogs");
        println!("OK"); // >>> OK

        let both_groups: usize = r
            .pfcount("both_groups")
            .await
            .expect("Failed to count both_groups");
        println!("{both_groups}"); // >>> 7

The main benefit that HyperLogLogs offer is their very low memory usage. They can count up to 2^64 items with less than 1% standard error using a maximum 12KB of memory.

More information

See the following pages to learn more:

RATE THIS PAGE
Back to top ↑