Readme
VectorLite
A tiny, in-process Rust vector store with built-in embeddings for sub-millisecond semantic search.
VectorLite is a high-performance, in-memory vector database optimized for AI agent and edge workloads.
It co-locates model inference (via Candle ) with a low-latency vector index, making it ideal for session-scoped , single-instance , or privacy-sensitive environments.
Why VectorLite?
Feature
Description
Sub-millisecond search
In-memory HNSW or flat search tuned for real-time agent loops.
Built-in embeddings
Runs all-MiniLM-L6-v2 locally using Candle, or any other model of your choice. No external API calls.
Single-binary simplicity
No dependencies, no servers to orchestrate. Start instantly via CLI or Docker.
Session-scoped collections
Perfect for ephemeral agent sessions or sidecars
Thread-safe concurrency
RwLock-based access and atomic ID generation for multi-threaded workloads.
Instant persistence
Save or restore collections snapshots in one call.
VectorLite trades distributed scalability for deterministic performance, perfect for use cases where latency mattters more than millions of vectors.
When to Use It
Scenario
Why VectorLite fits
AI agent sessions
Keep short-lived embeddings per conversation. No network latency.
Edge or embedded AI
Run fully offline with model + index in one binary.
Realtime search / personalization
Sub-ms search for pre-computed embeddings.
Local prototyping & CI
Rust-native, no external services.
Single-tenant microservices
Lightweight sidecar for semantic capabilities.
Quick Start
Run from Source
cargo run -- bin vectorlite -- --port 3001
# Start with preloaded collection
cargo run --bin vectorlite -- --filepath ./my_collection.vlc --port 3001
Run with Docker
With default settings:
docker build - t vectorlite .
docker run - p 3001:3001 vectorlite
With a different embeddings model and memory-optimized HNSW:
docker build \
--build-arg MODEL_NAME=" sentence-transformers/paraphrase-MiniLM-L3-v2" \
--build-arg FEATURES=" memory-optimized" \
-t vectorlite-small .
HTTP API Overview
Operation
Method & Endpoint
Body
Health
GET / health
–
List collections
GET / collections
–
Create collection
POST / collections
{ " name" : " docs" , " index_type" : " hnsw" }
Delete collection
DELETE / collections/ { name}
–
Add text
POST / collections/ { name} / text
{ " text" : " Hello world" , " metadata" : { ... } }
Search (text)
POST / collections/ { name} / search/ text
{ " query" : " hello" , " k" : 5 }
Get vector
GET / collections/ { name} / vectors/ { id}
–
Delete vector
DELETE / collections/ { name} / vectors/ { id}
–
Save collection
POST / collections/ { name} / save
{ " file_path" : " ./collection.vlc" }
Load collection
POST / collections/ load
{ " file_path" : " ./collection.vlc" , " collection_name" : " restored" }
Index Types
Index
Search Complexity
Insert
Use Case
Flat
O(n)
O(1)
Small datasets (<10K) or exact search
HNSW
O(log n)
O(log n)
Larger datasets or approximate search
See Hierarchical Navigable Small World .
Configuration profiles for HNSW
Profile
Features
Use Case
default
balanced
general workloads
memory-optimized
reduced precision, smaller graph
constrained devices
high-accuracy
higher recall, more memory
offline re-ranking or research
cargo build -- features memory-optimized
Similarity Metrics
Cosine : Default for normalized embeddings, scale-invariant
Euclidean : Geometric distance, sensitive to vector magnitude
Manhattan : L1 norm, robust to outliers
Dot Product : Raw similarity, requires consistent vector scaling
Rust SDK Example
use vectorlite:: { VectorLiteClient, EmbeddingGenerator, IndexType, SimilarityMetric} ;
use serde_json:: json;
fn main ( ) -> Result < ( ) , Box < dyn std:: error:: Error> > {
let client = VectorLiteClient:: new( Box :: new( EmbeddingGenerator:: new( ) ? ) ) ;
client. create_collection ( " quotes" , IndexType:: HNSW ) ? ;
let id = client. add_text_to_collection (
" quotes" ,
" I just want to lie on the beach and eat hot dogs" ,
Some ( json! ( {
" author" : " Kevin Malone" ,
" tags" : [ " the-office" , " s3:e23" ] ,
" year" : 2005 ,
} ) )
) ? ;
let results = client. search_text_in_collection (
" quotes" ,
" beach games" ,
3 ,
SimilarityMetric:: Cosine,
) ? ;
for result in & results {
println! ( " ID: {} , Score: {:.4} " , result. id, result. score) ;
}
Ok ( ( ) )
}
Testing
Run tests with mock embeddings (CI-friendly, no model files required):
cargo test -- features mock-embeddings
Run tests with local models:
cargo test
Download ML Model
This downloads the BERT-based embedding model files needed for real embedding generation:
huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 -- local-dir models/all-MiniLM-L6-v2
The model files must be present in the ./models/{model-name}/ directory with the required files:
config. json
pytorch_model. bin
tokenizer. json
Using a different model
You can override the default embedding model at compile time using the custom-model feature:
DEFAULT_EMBEDDING_MODEL = " sentence-transformers/paraphrase-MiniLM-L3-v2" cargo build -- features custom-model
DEFAULT_EMBEDDING_MODEL = " sentence-transformers/paraphrase-MiniLM-L3-v2" cargo run -- features custom-model
License
Apache 2.0 License - see LICENSE for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.