bloomfilters

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2026 License: MIT Imports: 10 Imported by: 0

README

Bloom Filters

Checks

A basic / generic and/or concurrent Bloom filter implementation in Go.

Installation

go get github.com/daanv2/go-bloom-filters

Features

  • Standard Bloom filter — basic, high-performance implementation
  • Concurrent Bloom filter — safe for concurrent reads and writes using a spinlock
  • Generic wrapper — use any type with a custom serializer via GenericBloomFilter[T]
  • Configurable hash functions — ships with FNV, CRC-64, SHA, and MD5; bring your own with WithHashFunctions
  • Functional options — clean builder pattern with WithSize, WithDefaultHashFunctions, etc.

Quick Start

package main

import (
	"fmt"

	bloomfilters "github.com/daanv2/go-bloom-filters"
)

func main() {
	bf, err := bloomfilters.NewBloomFilter(
		bloomfilters.WithSize(1024),
		bloomfilters.WithDefaultHashFunctions(),
	)
	if err != nil {
		panic(err)
	}

	bf.Add([]byte("hello"))
	bf.Add([]byte("world"))

	fmt.Println(bf.Test([]byte("hello"))) // true
	fmt.Println(bf.Test([]byte("world"))) // true
	fmt.Println(bf.Test([]byte("foo")))   // false (probably)
}
Concurrent Bloom Filter
bf, err := bloomfilters.NewConcurrentBloomFilter(
	bloomfilters.WithSize(1024),
	bloomfilters.WithDefaultHashFunctions(),
)
if err != nil {
	panic(err)
}

// Safe to call Add / Test from multiple goroutines
bf.Add([]byte("hello"))
fmt.Println(bf.Test([]byte("hello"))) // true
Generic Bloom Filter

Wrap any IBloomFilter to work with a custom type:

bf, _ := bloomfilters.NewBloomFilter(
	bloomfilters.WithSize(1024),
	bloomfilters.WithDefaultHashFunctions(),
)

sbf := bloomfilters.NewGenericBloomFilter(bf, func(s string) []byte {
	return []byte(s)
})

sbf.Add("hello")
fmt.Println(sbf.Test("hello")) // true
Bloom Settings

The pkg/bloomsettings package provides helper functions for tuning your filter:

  • OptimalHashFunctions(m, n) — returns the optimal number of hash functions for m bits and n expected elements
  • FalsePositiveRate(m, n, k) — calculates the expected false-positive rate for m bits, n elements, and k hash functions
import "github.com/daanv2/go-bloom-filters/pkg/bloomsettings"

k := bloomsettings.OptimalHashFunctions(1024, 100)   // optimal k
fp := bloomsettings.FalsePositiveRate(1024, 100, k)   // expected false-positive rate

Documentation

Overview

Package bloomfilters provides a basic, generic and async implementations of Bloom filters in Go.

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrInvalidSize          = errors.New("invalid size")
	ErrRequiredHashFunction = errors.New("at least one hash function is required")
	ErrHashIsNil            = errors.New("hash functions cannot be nil")
)

Functions

This section is empty.

Types

type Bits

type Bits struct {
	// contains filtered or unexported fields
}

Bits is a structure that represents a bit array for use in bloom filters. It provides methods to set and get bits, as well as to marshal and unmarshal the data for storage or transmission.

Example
package main

import (
	"fmt"

	bloomfilters "github.com/daanv2/go-bloom-filters"
)

func main() {
	bits := bloomfilters.NewBits(128) // Create a Bits structure with 128 bits
	bits.Setbit(5)                    // Set the bit at index 5
	bits.Setbit(10)                   // Set the bit at index 10

	// Check if specific bits are set
	fmt.Println(bits.Getbit(5))
	fmt.Println(bits.Getbit(10))
	fmt.Println(bits.Getbit(0))
	fmt.Println(bits.Getbit(127))

}
Output:
true
true
false
false

func NewBits

func NewBits(size uint64) Bits

NewBits creates a new Bits structure with the specified size in bits. It calculates the necessary number of uint64 words to accommodate the specified size.

func (*Bits) BitsCount

func (b *Bits) BitsCount() uint64

BitsCount returns the total number of bits that are set to 1 in the bloom filter.

func (*Bits) Copy

func (b *Bits) Copy() Bits

Copy returns a deep copy of the Bits structure.

func (*Bits) Equals

func (b *Bits) Equals(other *Bits) bool

Equals returns true if this Bits structure is equal to the other Bits structure.

func (*Bits) GetHash added in v1.0.2

func (b *Bits) GetHash(hash uint64) bool

GetHash checks if the bit at the index corresponding to the given hash value is set to 1.

func (*Bits) Getbit

func (b *Bits) Getbit(index uint64) bool

Getbit returns the value of the bit at the specified index (true if set, false otherwise).

func (*Bits) MarshalBinary

func (b *Bits) MarshalBinary() (data []byte, err error)

MarshalBinary implements encoding.BinaryMarshaler.

func (*Bits) MarshalJSON

func (b *Bits) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaler.

func (*Bits) MarshalText

func (b *Bits) MarshalText() (text []byte, err error)

MarshalText implements encoding.TextMarshaler.

func (*Bits) SetHash added in v1.0.2

func (b *Bits) SetHash(hash uint64)

SetHash sets the bit at the index corresponding to the given hash value to 1.

func (*Bits) Setbit

func (b *Bits) Setbit(index uint64)

Setbit sets the bit at the specified index to 1. If the index is out of bounds (greater than or equal to the size of the Bits), it will not set any bit.

func (*Bits) Size

func (b *Bits) Size() uint64

Size returns the total number of bits this storage can hold.

func (*Bits) UnmarshalBinary

func (b *Bits) UnmarshalBinary(data []byte) error

UnmarshalBinary implements encoding.BinaryUnmarshaler.

func (*Bits) UnmarshalJSON

func (b *Bits) UnmarshalJSON(d []byte) error

UnmarshalJSON implements json.Unmarshaler.

func (*Bits) UnmarshalText

func (b *Bits) UnmarshalText(text []byte) error

UnmarshalText implements encoding.TextUnmarshaler.

func (*Bits) Words

func (b *Bits) Words() []uint64

Words returns a copy slice of uint64 words representing the bits in the bloom filter.

type BloomFilter

type BloomFilter struct {
	// contains filtered or unexported fields
}

BloomFilter is a probabilistic data structure that tests whether an element is a member of a set. False positive matches are possible, but false negatives are not.

Example

Example of basic BloomFilter usage

package main

import (
	"fmt"

	bloomfilters "github.com/daanv2/go-bloom-filters"
	"github.com/daanv2/go-bloom-filters/pkg/bloomhashes"
)

func main() {
	// Create a bloom filter with size 1024 and FNV hash function
	bf, _ := bloomfilters.NewBloomFilter(
		bloomfilters.WithSize(1024),
		bloomfilters.WithHashFunctions([]bloomhashes.HashFunction{bloomhashes.Fnv1_64}),
	)

	// Add some data
	bf.Add([]byte("apple"))
	bf.Add([]byte("banana"))
	bf.Add([]byte("orange"))

	// Test if data is in the filter
	fmt.Println(bf.Test([]byte("apple")))
	fmt.Println(bf.Test([]byte("banana")))
	fmt.Println(bf.Test([]byte("grape")))

}
Output:
true
true
false
Example (MultipleHashes)

Example demonstrating bloom filter with multiple hash functions

package main

import (
	"fmt"

	bloomfilters "github.com/daanv2/go-bloom-filters"
	"github.com/daanv2/go-bloom-filters/pkg/bloomhashes"
)

func main() {
	// Create a bloom filter with multiple hash functions for better accuracy
	bf, _ := bloomfilters.NewBloomFilter(
		bloomfilters.WithSize(2048),
		bloomfilters.WithHashFunctions([]bloomhashes.HashFunction{
			bloomhashes.Fnv1_64,
			bloomhashes.Fnv1_64a,
			bloomhashes.MD5,
		}),
	)

	// Add email addresses
	bf.Add([]byte("user@example.com"))
	bf.Add([]byte("admin@example.com"))

	// Check if emails exist
	fmt.Println(bf.Test([]byte("user@example.com")))
	fmt.Println(bf.Test([]byte("admin@example.com")))
	fmt.Println(bf.Test([]byte("guest@example.com")))

}
Output:
true
true
false

func NewBloomFilter

func NewBloomFilter(opts ...BloomFilterOptions) (*BloomFilter, error)

NewBloomFilter creates a new bloom filter with the given options. It returns an error if the size of the bloom filter is invalid or if any of the hash functions are nil.

func (*BloomFilter) Add

func (bf *BloomFilter) Add(data []byte)

Add adds the given data to the bloom filter by applying each hash function to the data and setting the corresponding bits in the filter.

func (*BloomFilter) Bits

func (bf *BloomFilter) Bits() Bits

Bits returns a copy of the Bits struct representing the bit array of the bloom filter. Modifying the returned Bits will not affect the internal state of the bloom filter.

func (*BloomFilter) BitsCount

func (bf *BloomFilter) BitsCount() uint64

BitsCount returns the total number of bits that are set to 1 in the bloom filter.

func (*BloomFilter) GetHash

func (bf *BloomFilter) GetHash(hash uint64) bool

Get checks if the bit at the index corresponding to the given hash value is set to 1.

func (*BloomFilter) SetHash

func (bf *BloomFilter) SetHash(hash uint64)

Set sets the bit at the index corresponding to the given hash value to 1.

func (*BloomFilter) Test

func (bf *BloomFilter) Test(data []byte) bool

Test checks if the given data is likely to be in the bloom filter by applying each hash function to the data and checking if the corresponding bits in the filter are set. It returns true if all bits are set, indicating that the data is likely to be in the filter, and false otherwise.

func (*BloomFilter) Words

func (bf *BloomFilter) Words() []uint64

Words returns a copy slice of uint64 words representing the bits in the bloom filter.

type BloomFilterOptions

type BloomFilterOptions interface {
	// contains filtered or unexported methods
}

func WithAllHashFunctions

func WithAllHashFunctions() BloomFilterOptions

WithAllHashFunctions sets all available hash functions to be used by the bloom filter. It retrieves all hash functions from the bloomhashes package and assigns them to the bloom filter's hashes field.

func WithAppendHashFunctions

func WithAppendHashFunctions(hashFunctions []bloomhashes.HashFunction) BloomFilterOptions

WithAppendHashFunctions appends additional hash functions to the existing list of hash functions used by the bloom filter. It takes a slice of HashFunction and appends it to the current list of hash functions.

func WithBits

func WithBits(bits Bits) BloomFilterOptions

WithBits sets the bits of the bloom filter using a Bits structure. It directly assigns the provided Bits to the bloom filter, allowing for more flexible manipulation of the bit array.

func WithDefaultHashFunctions

func WithDefaultHashFunctions() BloomFilterOptions

WithDefaultHashFunctions sets the default hash functions to be used by the bloom filter. It retrieves the default hash functions from the bloomhashes package and assigns them to the bloom filter's hashes field.

func WithHashFunctions

func WithHashFunctions(hashFunctions []bloomhashes.HashFunction) BloomFilterOptions

WithHashFunctions sets the hash functions to be used by the bloom filter. It replaces any existing hash functions with the provided slice of HashFunction.

func WithSize

func WithSize(size uint64) BloomFilterOptions

WithSize sets the size of the bloom filter in bits. It calculates the necessary number of uint64 words to accommodate the specified size and initializes the Bits structure accordingly.

func WithWords

func WithWords(words []uint64) BloomFilterOptions

WithWords sets the bits of the bloom filter using a slice of uint64 words. It initializes the Bits structure with the provided words, allowing for direct manipulation of the bloom filter's bit array.

type ConcurrentBloomFilter

type ConcurrentBloomFilter struct {
	// contains filtered or unexported fields
}

ConcurrentBloomFilter is a thread-safe bloom filter that uses a spinlock for concurrent access. It is safe to call Add and Test methods from multiple goroutines.

Example

Example of basic ConcurrentBloomFilter usage

package main

import (
	"fmt"

	bloomfilters "github.com/daanv2/go-bloom-filters"
	"github.com/daanv2/go-bloom-filters/pkg/bloomhashes"
)

func main() {
	// Create a bloom filter with size 1024 and FNV hash function
	bf, _ := bloomfilters.NewConcurrentBloomFilter(
		bloomfilters.WithSize(1024),
		bloomfilters.WithHashFunctions([]bloomhashes.HashFunction{bloomhashes.Fnv1_64}),
	)

	// Add some data
	bf.Add([]byte("apple"))
	bf.Add([]byte("banana"))
	bf.Add([]byte("orange"))

	// Test if data is in the filter
	fmt.Println(bf.Test([]byte("apple")))
	fmt.Println(bf.Test([]byte("banana")))
	fmt.Println(bf.Test([]byte("grape")))

}
Output:
true
true
false
Example (MultipleHashes)

Example demonstrating concurrent bloom filter with multiple hash functions

package main

import (
	"fmt"

	bloomfilters "github.com/daanv2/go-bloom-filters"
	"github.com/daanv2/go-bloom-filters/pkg/bloomhashes"
)

func main() {
	// Create a bloom filter with multiple hash functions for better accuracy
	bf, _ := bloomfilters.NewConcurrentBloomFilter(
		bloomfilters.WithSize(2048),
		bloomfilters.WithHashFunctions([]bloomhashes.HashFunction{
			bloomhashes.Fnv1_64,
			bloomhashes.Fnv1_64a,
			bloomhashes.MD5,
		}),
	)

	// Add email addresses
	bf.Add([]byte("user@example.com"))
	bf.Add([]byte("admin@example.com"))

	// Check if emails exist
	fmt.Println(bf.Test([]byte("user@example.com")))
	fmt.Println(bf.Test([]byte("admin@example.com")))
	fmt.Println(bf.Test([]byte("guest@example.com")))

}
Output:
true
true
false

func NewConcurrentBloomFilter

func NewConcurrentBloomFilter(opts ...BloomFilterOptions) (*ConcurrentBloomFilter, error)

NewConcurrentBloomFilter creates a new concurrent bloom filter with the given options. It returns an error if the size of the bloom filter is invalid or if any of the hash functions are nil.

func (*ConcurrentBloomFilter) Add

func (bf *ConcurrentBloomFilter) Add(data []byte)

Add adds the given data to the bloom filter by applying each hash function to the data and setting the corresponding bits in the filter. This method is thread-safe.

func (*ConcurrentBloomFilter) Bits

func (bf *ConcurrentBloomFilter) Bits() Bits

Bits returns a copy of the Bits struct representing the bit array of the bloom filter. Modifying the returned Bits will not affect the internal state of the bloom filter.

func (*ConcurrentBloomFilter) BitsCount

func (bf *ConcurrentBloomFilter) BitsCount() uint64

BitsCount returns the total number of bits that are set to 1 in the bloom filter.

func (*ConcurrentBloomFilter) GetHash

func (bf *ConcurrentBloomFilter) GetHash(hash uint64) bool

GetHash checks if the bit at the index corresponding to the given hash value is set to 1. This method is thread-safe.

func (*ConcurrentBloomFilter) SetHash

func (bf *ConcurrentBloomFilter) SetHash(hash uint64)

SetHash sets the bit at the index corresponding to the given hash value to 1. This method is thread-safe.

func (*ConcurrentBloomFilter) Test

func (bf *ConcurrentBloomFilter) Test(data []byte) bool

Test checks if the given data is likely to be in the bloom filter by applying each hash function to the data and checking if the corresponding bits in the filter are set. This method is thread-safe. It returns true if all bits are set, indicating that the data is likely to be in the filter, and false otherwise.

type GenericBloomFilter

type GenericBloomFilter[T any] struct {
	// contains filtered or unexported fields
}

GenericBloomFilter is a wrapper around IBloomFilter that allows using custom types with a serializer function. It converts custom types to byte slices before passing them to the underlying bloom filter.

func NewGenericBloomFilter

func NewGenericBloomFilter[T any](base IBloomFilter, getBytes func(T) []byte) *GenericBloomFilter[T]

NewGenericBloomFilter creates a new generic bloom filter that wraps the provided bloom filter and uses the given serializer function. The serializer function converts custom types to byte slices for hashing.

func (*GenericBloomFilter[T]) Add

func (g *GenericBloomFilter[T]) Add(data T)

Add implements IBloomFilter.

func (*GenericBloomFilter[T]) Bits

func (bf *GenericBloomFilter[T]) Bits() Bits

Bits returns a copy of the Bits struct representing the bit array of the bloom filter. Modifying the returned Bits will not affect the internal state of the bloom filter.

func (*GenericBloomFilter[T]) BitsCount

func (bf *GenericBloomFilter[T]) BitsCount() uint64

BitsCount returns the total number of bits that are set to 1 in the bloom filter.

func (*GenericBloomFilter[T]) GetHash

func (g *GenericBloomFilter[T]) GetHash(hash uint64) bool

GetHash implements IBloomFilter.

func (*GenericBloomFilter[T]) SetHash

func (g *GenericBloomFilter[T]) SetHash(hash uint64)

SetHash implements IBloomFilter.

func (*GenericBloomFilter[T]) Test

func (g *GenericBloomFilter[T]) Test(data T) bool

Test implements IBloomFilter.

type IBloomFilter

type IBloomFilter interface {
	SetHash(hash uint64)      // SetHash sets the bit corresponding to the given hash value in the bloom filter.
	GetHash(hash uint64) bool // GetHash checks if the bit corresponding to the given hash value is set in the bloom filter.
	Add(data []byte)          // Add adds the given data to the bloom filter by applying the hash functions and setting the corresponding bits.
	Test(data []byte) bool    // Test checks if the given data is likely to be in the bloom filter by applying the hash functions and checking the corresponding bits.
	BitsCount() uint64        // BitsCount returns the total number of bits that are set to 1 in the bloom filter.
	Bits() Bits               // Bits returns the underlying bit array of the bloom filter.
}

IBloomFilter defines the interface for bloom filter implementations. It provides methods for adding elements, testing membership, and managing hash values.

Directories

Path Synopsis
pkg
tests

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL