5 releases
Uses new Rust 2024
| 0.2.0 | Jan 10, 2026 |
|---|---|
| 0.1.3 | Jan 9, 2026 |
| 0.1.2 | Jan 8, 2026 |
| 0.1.1 | Jan 8, 2026 |
| 0.1.0 | Jan 7, 2026 |
#879 in Database interfaces
170KB
3.5K
SLoC
Datasphere
Named after the AI knowledge network in Dan Simmons' Hyperion Cantos — a vast repository where all information exists and can be accessed.
Background daemon that distills knowledge from Claude Code sessions into a queryable knowledge graph.
What It Does
Datasphere watches your Claude Code sessions, extracts insights via LLM distillation, and embeds them for semantic search. Think of it as long-term memory for your AI coding sessions.
┌─────────────────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐
│ Session JSONL │────▶│ Distill │────▶│ Embed │────▶│ LanceDB │
│ (~/.claude/ │ │ (LLM) │ │ │ │ (nodes) │
│ projects/) │ └──────────┘ └─────────┘ └─────────┘
└─────────────────┘
Installation
# From crates.io
cargo install datasphere
# Or from source
cargo install --path .
# Verify
ds --help
Quick Start
# One-shot scan of current project's sessions
ds scan
# Start daemon (watches ALL projects continuously)
ds start
# Query the knowledge graph
ds query "how to chunk large texts"
# Check database stats
ds stats
CLI Commands
| Command | Description |
|---|---|
ds scan |
One-shot distillation of sessions (current project) |
ds start |
Daemon mode - watches all projects, queues jobs |
ds query <text> |
Semantic search of knowledge graph |
ds queue |
Show job queue status |
ds queue pending |
List pending jobs |
ds queue clear |
Clear completed jobs |
ds queue nuke |
Delete all jobs |
ds stats |
Show database statistics |
ds show |
Display stored nodes |
ds add <file> |
Add a text file to the graph (no LLM distillation) |
ds related <id> |
Find nodes similar to a given node |
ds reset |
Delete database and queue (fresh start) |
MCP Server
The mcp/ directory contains a minimal MCP server for Claude Code integration:
# Install dependencies
cd mcp && npm install
# Add to Claude Code
claude mcp add datasphere -s user -- node /path/to/datasphere/mcp/index.js
This exposes datasphere_query and datasphere_related tools that Claude can use to search your knowledge graph during conversations.
How It Works
1. Watching
The daemon watches ~/.claude/projects/ for changes to session transcript files (.jsonl). Events are queued and processed one at a time with rate limiting.
2. Distillation
Each session is chunked and sent to an LLM for knowledge extraction. The LLM identifies:
- Key insights and learnings
- Patterns and approaches
- Decisions and their rationale
3. Embedding
Extracted insights are embedded (1536 dimensions) for semantic search.
4. Change Detection
SimHash is used to detect meaningful changes. Sessions are only re-processed when content changes significantly (Hamming distance > 10 bits).
Storage
~/.datasphere/
├── db/ # LanceDB database
│ ├── nodes.lance/ # Knowledge nodes with embeddings
│ └── processed.lance/ # Processing records (deduplication)
└── queue.jsonl # Persistent job queue
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Required for embeddings |
ANTHROPIC_API_KEY |
Required for distillation (or use claude CLI auth) |
Configuration
Currently configuration is via code constants:
- Job processing delay: 500ms
- SimHash change threshold: 10 bits
Development
cargo build # Dev build
cargo test # Run tests
cargo build --release # Release build
Architecture
See CLAUDE.md for detailed module structure and design principles.
Dependencies
~178MB
~3M SLoC