4 releases

new 0.1.4	Feb 9, 2026
0.1.3	Feb 9, 2026
0.1.2	Feb 8, 2026
0.1.1	Feb 8, 2026

#1119 in Database interfaces

Used in magic-shq

MIT license

335KB
7.5K SLoC

MAGIC: Multi-Actor Gateway for Invocation Capture

MAGIC is a system for capturing, storing, and querying shell command history with structured output data. It combines automatic capture, content-addressed storage, and powerful SQL queries to give you deep insights into your build and development workflows.

What is MAGIC?

MAGIC = BIRD + shq

┌──────────────────────────────────────────────────────────────────┐
│                          Your Shell                              │
│                                                                  │
│  $ make test                                                     │
│  $ shq show              # show output from last command         │
│  $ shq history           # browse command history                │
│  $ shq sql "SELECT * FROM invocations WHERE exit_code != 0"      │
└──────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                     shq (Shell Query)                            │
│                                                                  │
│  • Captures commands automatically via shell hooks               │
│  • Provides CLI for querying: shq show, shq history, shq sql     │
│  • Manages data lifecycle: shq archive, shq compact              │
└──────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                  BIRD (Storage Layer)                            │
│                                                                  │
│  • DuckDB-based database with Parquet files                      │
│  • Content-addressed blob storage (70-90% space savings!)        │
│  • Hot/warm/cold tiering for lifecycle management                │
│  • SQL-queryable: invocations, outputs, sessions                 │
└──────────────────────────────────────────────────────────────────┘

Key Features

Queryable Everything

-- Find all failed builds in the last week
SELECT cmd, exit_code, duration_ms, timestamp
FROM invocations
WHERE exit_code != 0 AND date >= CURRENT_DATE - 7;

-- Find flaky commands (sometimes pass, sometimes fail)
SELECT cmd,
       COUNT(*) as runs,
       SUM(CASE WHEN exit_code = 0 THEN 1 ELSE 0 END) as passes,
       SUM(CASE WHEN exit_code != 0 THEN 1 ELSE 0 END) as failures
FROM invocations
WHERE cmd LIKE '%test%'
GROUP BY cmd
HAVING failures > 0 AND passes > 0;

-- Slowest commands this week
SELECT cmd, duration_ms, timestamp
FROM invocations
WHERE date >= CURRENT_DATE - 7 AND duration_ms IS NOT NULL
ORDER BY duration_ms DESC
LIMIT 10;

Intelligent Storage (70-90% Savings)

Content-addressed blob storage with automatic deduplication:

100 identical test runs → 1 blob file (99% savings)
CI/CD workflows with repeated outputs → 70-90% reduction
Automatic: no configuration needed
Fast: BLAKE3 hashing at 3GB/s

Structured Output Parsing (Coming in v0.6)

Optional integration with duck_hunt, a DuckDB extension for parsing 90+ log formats:

Compilers: GCC, Clang, MSVC, rustc
Linters: ESLint, pylint, shellcheck
Test runners: pytest, jest, go test
Build tools: Make, Cargo, npm

Note: duck_hunt integration is planned for v0.6. Currently shq captures raw outputs.

Fast Queries

DuckDB columnar storage for analytics
Partitioned by date for fast filtering
Indexed by hash for instant deduplication
Gzip-compressed blobs (DuckDB reads directly)

Hot/Warm/Cold Tiering

Automatic lifecycle management:

Hot (0-14 days): Recent commands, fast SSD
Warm (14-90 days): Compressed Parquet
Cold (>90 days): Archive tier, deduplicated

Quick Start

Installation

# Install shq (includes BIRD)
cargo install --git https://github.com/yourorg/magic shq

# Initialize BIRD database
shq init

# Add shell integration to your shell config
eval "$(shq hook init)"  # Auto-detects zsh/bash

# Or specify shell explicitly
eval "$(shq hook init --shell zsh)"
eval "$(shq hook init --shell bash)"

Basic Usage

# Commands are automatically captured via shell hooks
make test

# View output from the last command
shq show

# View recent commands
shq history

# Run a command with full output capture
shq run make all

# Or use shqr (shell function, captures output while showing it)
shqr make test

# Query with SQL
shq sql "SELECT cmd, exit_code FROM invocations WHERE date = CURRENT_DATE"

# View today's commands
shq sql "SELECT * FROM invocations_today"

# View failed commands
shq sql "SELECT * FROM failed_invocations LIMIT 10"

Advanced Queries

# Find flaky tests (inconsistent exit codes)
shq sql "
  SELECT
    cmd,
    COUNT(DISTINCT exit_code) as exit_codes,
    SUM(CASE WHEN exit_code = 0 THEN 1 ELSE 0 END) as passes,
    SUM(CASE WHEN exit_code != 0 THEN 1 ELSE 0 END) as failures
  FROM invocations
  WHERE cmd LIKE '%test%'
  GROUP BY cmd
  HAVING exit_codes > 1
"

# Storage savings report
shq sql "
  SELECT
    COUNT(*) as total_outputs,
    COUNT(DISTINCT content_hash) as unique_blobs,
    ROUND(100.0 * (1 - COUNT(DISTINCT content_hash)::FLOAT / COUNT(*)), 1) as dedup_percent
  FROM outputs
"

# Slowest commands this week
shq sql "
  SELECT cmd, duration_ms, timestamp
  FROM invocations
  WHERE date >= CURRENT_DATE - 7
  ORDER BY duration_ms DESC
  LIMIT 10
"

Architecture

Directory Structure

~/.local/share/bird/              # Default BIRD_ROOT
├── db/
│   └── bird.duckdb               # Main DuckDB database
├── data/
│   ├── recent/                   # Hot tier (0-14 days)
│   │   ├── invocations/
│   │   │   └── date=YYYY-MM-DD/*.parquet
│   │   ├── outputs/
│   │   │   └── date=YYYY-MM-DD/*.parquet
│   │   └── sessions/
│   │       └── date=YYYY-MM-DD/*.parquet
│   └── archive/                  # Cold tier (>14 days)
│       └── ...                   # Same structure
├── blobs/
│   └── content/                  # Content-addressed pool
│       ├── ab/
│       │   └── abc123...def.bin.gz
│       └── ...                   # 256 subdirs (00-ff)
├── extensions/                   # DuckDB extensions
├── sql/                          # Custom SQL files
└── config.toml

Components

1. BIRD (Buffer and Invocation Record Database)

Purpose: Storage layer with DuckDB backend

Features:

DuckDB columnar database for fast analytics
Parquet files for efficient, lock-free storage and querying
Content-addressed blob storage with deduplication
Automatic tiering (hot/warm/cold)
SQL-queryable schema

Schema:

invocations - Command execution metadata
outputs - Stdout/stderr with content hashes
sessions - Shell session information
blob_registry - Tracks deduplicated blobs

Convenience Views:

recent_invocations - Commands from last 7 days
invocations_today - Commands from today
failed_invocations - Commands with non-zero exit code
invocations_with_outputs - Joined view with output metadata
clients - Aggregated client information

2. shq (Shell Query)

Purpose: CLI and shell integration

Features:

Automatic command capture via shell hooks
CLI for common queries (shq show, shq history, shq sql)
Direct SQL access with full DuckDB power
Data lifecycle management (shq archive, shq compact)

Commands:

shq init              # Initialize BIRD database
shq run CMD           # Run and capture command with output
shq save              # Manually save from pipes (used by shell hooks)
shq show              # Show output from the last command
shq show -O           # Show only stdout
shq show -E           # Show only stderr
shq show --head 20    # Show first 20 lines
shq history           # Browse command history
shq sql "QUERY"       # Execute SQL query
shq stats             # Show statistics
shq archive           # Move old data to archive tier
shq compact           # Compact parquet files for better performance
shq hook init         # Generate shell integration code

Shell Functions (provided by hook init):

shqr CMD              # Run command with full output capture, displaying in real-time

3. Content-Addressed Storage

Purpose: Deduplicate identical outputs

How it works:

Hash output with BLAKE3
Check if blob exists (by hash)
If exists: increment reference count (DEDUP HIT!)
If not: write new blob (DEDUP MISS)

Benefits:

70-90% storage savings for CI/CD workloads
Automatic (no configuration)
Fast (2-3ms overhead per command)
Transparent to queries

4. duck_hunt Integration (Coming in v0.6)

duck_hunt is a separate DuckDB extension for parsing structured data from logs. It supports 90+ formats and provides retrospective extraction of errors and warnings without the noise of raw CLI output.

Formats to be supported:

Compilers: GCC, Clang, MSVC, rustc
Linters: ESLint, pylint, shellcheck
Test runners: pytest, jest, go test
Build tools: Make, Cargo, npm

Note: This integration is planned for v0.6. Currently outputs are stored as raw content.

Use Cases

1. Debugging Failed Builds

# What failed? Show output from last command
shq show

# Show only stderr
shq show -E

# Has this failed before?
shq sql "
  SELECT timestamp, cmd, exit_code
  FROM invocations
  WHERE cmd LIKE '%make test%'
  ORDER BY timestamp DESC
  LIMIT 10
"

2. Finding Flaky Tests

shq sql "
  SELECT
    cmd,
    COUNT(*) as runs,
    SUM(CASE WHEN exit_code = 0 THEN 1 ELSE 0 END) as passes,
    SUM(CASE WHEN exit_code != 0 THEN 1 ELSE 0 END) as failures,
    ROUND(100.0 * SUM(CASE WHEN exit_code = 0 THEN 1 ELSE 0 END) / COUNT(*), 1) as pass_rate
  FROM invocations
  WHERE cmd LIKE '%test%'
  GROUP BY cmd
  HAVING failures > 0
  ORDER BY pass_rate ASC
"

3. Performance Analysis

# Slowest commands today
shq sql "
  SELECT cmd, duration_ms, timestamp
  FROM invocations_today
  WHERE duration_ms IS NOT NULL
  ORDER BY duration_ms DESC
  LIMIT 10
"

# Build time trends
shq sql "
  SELECT
    date,
    AVG(duration_ms) as avg_duration,
    MIN(duration_ms) as min_duration,
    MAX(duration_ms) as max_duration
  FROM invocations
  WHERE cmd LIKE '%make%'
  GROUP BY date
  ORDER BY date DESC
"

4. CI/CD Integration

# In your CI pipeline
shq run make test || {
  echo "Build failed!"
  shq show > build-output.txt
  shq show -E > build-stderr.txt
  shq sql "SELECT * FROM failed_invocations ORDER BY timestamp DESC LIMIT 5"
  exit 1
}

5. Data Lifecycle Management

# Archive old data (moves from recent/ to archive/)
shq archive                  # Archive data older than 14 days
shq archive --days 30        # Archive data older than 30 days
shq archive --dry-run        # Preview what would be archived

# Compact parquet files (merges many small files into fewer large ones)
shq compact                  # Compact all sessions
shq compact -s $SESSION_ID   # Compact specific session only
shq compact --today          # Only compact today's partition
shq compact --dry-run        # Preview what would be compacted

Note: Shell hooks automatically run shq compact -s $session --today -q in the background after each command to keep file counts manageable.

6. Storage Management

# Check storage savings
shq sql "
  SELECT
    storage_tier,
    COUNT(*) as num_blobs,
    SUM(byte_length) / 1024 / 1024 as total_mb,
    SUM(ref_count) as total_references,
    ROUND(AVG(ref_count), 2) as avg_refs_per_blob
  FROM blob_registry
  GROUP BY storage_tier
"

# Find most-reused blobs
shq sql "
  SELECT
    content_hash,
    ref_count,
    byte_length / 1024 as size_kb,
    storage_path
  FROM blob_registry
  ORDER BY ref_count DESC
  LIMIT 10
"

Configuration

config.toml

[storage]
bird_root = "~/.local/share/bird"
hot_days = 14              # Days before archiving
inline_threshold = 1048576 # 1MB - inline vs blob

[capture]
auto_capture = true
exclude_patterns = [
  "^cd ",
  "^ls ",
  "^pwd$"
]

[parsing]
duck_hunt_enabled = true
default_format = "auto"    # auto-detect format

[performance]
max_output_size = 52428800 # 50MB max per output

Performance

Storage Savings

Workload	Commands/Day	Dedup Ratio	Before	After	Savings
CI Tests	10,000	80%	60GB	12GB	48GB (80%)
Builds	1,000	70%	150GB	45GB	105GB (70%)
Local Dev	5,000	60%	15GB	6GB	9GB (60%)

Query Performance

Recent commands: <10ms
Historical queries: <100ms
Full-text search: <500ms
Storage overhead: <3ms per command

Overhead

Hash computation: 1.7ms (5MB @ 3GB/s)
Dedup check: 0.5ms (indexed lookup)
Total per command: 2.7ms

Integration with Existing Tools

blq (Build Log Query)

blq is a separate tool for build log analysis that uses the BIRD schema:

# Use blq for advanced log analysis
make 2>&1 | blq parse --format gcc | shq import

# Or use shq's built-in parsing
shq run --format gcc make

DuckDB CLI

# Direct DuckDB access
duckdb ~/.local/share/bird/db/bird.duckdb

# Use BIRD's views and macros
> SELECT * FROM recent_invocations LIMIT 10;
> SELECT * FROM invocations_today LIMIT 10;

tmux/screen

# Capture in background pane
tmux new-session -d 'shq run make watch'

# Query while it runs
shq sql "SELECT COUNT(*) FROM invocations_today"

Development

Building from Source

git clone https://github.com/yourorg/magic
cd magic

# Build shq
cargo build --release

# Run tests
cargo test

# Install locally
cargo install --path .

Project Structure

magic/
├── shq/                  # CLI and capture logic
│   ├── src/
│   │   ├── capture.rs    # Command capture
│   │   ├── parse.rs      # Output parsing
│   │   ├── query.rs      # SQL queries
│   │   └── main.rs       # CLI
│   └── tests/
├── bird/                 # Storage layer (library)
│   ├── src/
│   │   ├── schema.rs     # DuckDB schema
│   │   ├── storage.rs    # Blob storage
│   │   └── lib.rs
│   └── tests/
├── docs/                 # Documentation
│   ├── bird_spec.md
│   ├── shq_implementation.md
│   └── ...
└── README.md            # This file

Documentation

bird_spec.md - Complete BIRD specification
shq_implementation.md - shq implementation guide
shq_shell_integration.md - Shell hook details
CONTENT_ADDRESSED_BLOBS.md - Storage design
IMPLEMENTATION_GUIDE.md - Step-by-step guide

FAQ

Q: How much storage does MAGIC use?

A: With content-addressed storage, typically 70-90% less than storing each output separately. For a typical developer:

Without dedup: ~60GB/month
With dedup: ~12GB/month
Savings: 48GB (80%)

Q: Does automatic capture slow down my shell?

A: No. Capture happens asynchronously after command completion. Overhead is <3ms per command.

Q: Can I disable capture for specific commands?

A: Yes. Add patterns to config.toml:

[capture]
exclude_patterns = ["^cd ", "^ls ", "^pwd$"]

Or prefix with space ( command) or backslash (\command) to skip capture.

Q: How do I query old data?

A: All data remains queryable regardless of tier:

-- Works seamlessly across hot/warm/cold
SELECT * FROM invocations WHERE date >= '2025-01-01';

Q: What if two processes write the same blob?

A: Atomic rename handles race conditions. Both processes end up using the same blob (by hash). No data loss.

Q: Can I use this in CI/CD?

A: Yes! Common pattern:

shq run make test || {
  shq show > build-output.txt
  shq show -E > build-stderr.txt
  exit 1
}

Q: How do I clean up old data?

A: Use the archive and compact commands:

# Archive data older than 14 days (default)
shq archive

# Archive data older than 30 days
shq archive --days 30

# Compact parquet files (reduces file count, improves query performance)
shq compact

# Dry-run to see what would happen
shq archive --dry-run
shq compact --dry-run

Q: Does this work with multiple machines?

A: Yes! Each machine writes to its own client_id. Query across machines:

SELECT client_id, COUNT(*)
FROM invocations
GROUP BY client_id;

Roadmap

v0.1 - Basic capture + DuckDB storage
v0.2 - Content-addressed blobs + deduplication
v0.3 - Core shq commands (run, save, show, history, sql, stats)
v0.4 - Shell integration (zsh, bash hooks)
v0.5 - Archive and compact commands
v0.6 - duck_hunt integration for parsing and searching
v0.7 - TUI history browser (or integration with existing tools)
v0.8 - tmux integration
v1.0 - Production ready
- Comprehensive test coverage
- Performance benchmarks
- Migration tools
- Documentation complete
v2.0 - Advanced features
- Cross-machine deduplication
- Real-time collaboration
- ML-based anomaly detection

Future Directions

COBE (Claude Output Buffer Extractor): Import Claude Code command history (stored in SQLite) into BIRD for unified querying across human and AI-driven shell sessions.

Contributing

Contributions welcome! Please see CONTRIBUTING.md.

Key Areas

Performance optimizations
Documentation improvements
Bug reports and fixes

License

MIT License - see LICENSE

Credits

DuckDB - Fast analytical database
duck_hunt - Log parsing DuckDB extension
BLAKE3 - Fast cryptographic hashing
Parquet - Efficient columnar storage

Support

GitHub Issues: https://github.com/yourorg/magic/issues
Discussions: https://github.com/yourorg/magic/discussions

Built for developers who want to understand their build workflows

Dependencies

~30–48MB
~618K SLoC