4 releases
| new 0.1.4 | Feb 9, 2026 |
|---|---|
| 0.1.3 | Feb 9, 2026 |
| 0.1.2 | Feb 8, 2026 |
| 0.1.1 | Feb 8, 2026 |
#1119 in Database interfaces
Used in magic-shq
335KB
7.5K
SLoC
MAGIC: Multi-Actor Gateway for Invocation Capture
MAGIC is a system for capturing, storing, and querying shell command history with structured output data. It combines automatic capture, content-addressed storage, and powerful SQL queries to give you deep insights into your build and development workflows.
What is MAGIC?
MAGIC = BIRD + shq
┌──────────────────────────────────────────────────────────────────┐
│ Your Shell │
│ │
│ $ make test │
│ $ shq show # show output from last command │
│ $ shq history # browse command history │
│ $ shq sql "SELECT * FROM invocations WHERE exit_code != 0" │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ shq (Shell Query) │
│ │
│ • Captures commands automatically via shell hooks │
│ • Provides CLI for querying: shq show, shq history, shq sql │
│ • Manages data lifecycle: shq archive, shq compact │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BIRD (Storage Layer) │
│ │
│ • DuckDB-based database with Parquet files │
│ • Content-addressed blob storage (70-90% space savings!) │
│ • Hot/warm/cold tiering for lifecycle management │
│ • SQL-queryable: invocations, outputs, sessions │
└──────────────────────────────────────────────────────────────────┘
Key Features
Queryable Everything
-- Find all failed builds in the last week
SELECT cmd, exit_code, duration_ms, timestamp
FROM invocations
WHERE exit_code != 0 AND date >= CURRENT_DATE - 7;
-- Find flaky commands (sometimes pass, sometimes fail)
SELECT cmd,
COUNT(*) as runs,
SUM(CASE WHEN exit_code = 0 THEN 1 ELSE 0 END) as passes,
SUM(CASE WHEN exit_code != 0 THEN 1 ELSE 0 END) as failures
FROM invocations
WHERE cmd LIKE '%test%'
GROUP BY cmd
HAVING failures > 0 AND passes > 0;
-- Slowest commands this week
SELECT cmd, duration_ms, timestamp
FROM invocations
WHERE date >= CURRENT_DATE - 7 AND duration_ms IS NOT NULL
ORDER BY duration_ms DESC
LIMIT 10;
Intelligent Storage (70-90% Savings)
Content-addressed blob storage with automatic deduplication:
- 100 identical test runs → 1 blob file (99% savings)
- CI/CD workflows with repeated outputs → 70-90% reduction
- Automatic: no configuration needed
- Fast: BLAKE3 hashing at 3GB/s
Structured Output Parsing (Coming in v0.6)
Optional integration with duck_hunt, a DuckDB extension for parsing 90+ log formats:
- Compilers: GCC, Clang, MSVC, rustc
- Linters: ESLint, pylint, shellcheck
- Test runners: pytest, jest, go test
- Build tools: Make, Cargo, npm
Note: duck_hunt integration is planned for v0.6. Currently shq captures raw outputs.
Fast Queries
- DuckDB columnar storage for analytics
- Partitioned by date for fast filtering
- Indexed by hash for instant deduplication
- Gzip-compressed blobs (DuckDB reads directly)
Hot/Warm/Cold Tiering
Automatic lifecycle management:
- Hot (0-14 days): Recent commands, fast SSD
- Warm (14-90 days): Compressed Parquet
- Cold (>90 days): Archive tier, deduplicated
Quick Start
Installation
# Install shq (includes BIRD)
cargo install --git https://github.com/yourorg/magic shq
# Initialize BIRD database
shq init
# Add shell integration to your shell config
eval "$(shq hook init)" # Auto-detects zsh/bash
# Or specify shell explicitly
eval "$(shq hook init --shell zsh)"
eval "$(shq hook init --shell bash)"
Basic Usage
# Commands are automatically captured via shell hooks
make test
# View output from the last command
shq show
# View recent commands
shq history
# Run a command with full output capture
shq run make all
# Or use shqr (shell function, captures output while showing it)
shqr make test
# Query with SQL
shq sql "SELECT cmd, exit_code FROM invocations WHERE date = CURRENT_DATE"
# View today's commands
shq sql "SELECT * FROM invocations_today"
# View failed commands
shq sql "SELECT * FROM failed_invocations LIMIT 10"
Advanced Queries
# Find flaky tests (inconsistent exit codes)
shq sql "
SELECT
cmd,
COUNT(DISTINCT exit_code) as exit_codes,
SUM(CASE WHEN exit_code = 0 THEN 1 ELSE 0 END) as passes,
SUM(CASE WHEN exit_code != 0 THEN 1 ELSE 0 END) as failures
FROM invocations
WHERE cmd LIKE '%test%'
GROUP BY cmd
HAVING exit_codes > 1
"
# Storage savings report
shq sql "
SELECT
COUNT(*) as total_outputs,
COUNT(DISTINCT content_hash) as unique_blobs,
ROUND(100.0 * (1 - COUNT(DISTINCT content_hash)::FLOAT / COUNT(*)), 1) as dedup_percent
FROM outputs
"
# Slowest commands this week
shq sql "
SELECT cmd, duration_ms, timestamp
FROM invocations
WHERE date >= CURRENT_DATE - 7
ORDER BY duration_ms DESC
LIMIT 10
"
Architecture
Directory Structure
~/.local/share/bird/ # Default BIRD_ROOT
├── db/
│ └── bird.duckdb # Main DuckDB database
├── data/
│ ├── recent/ # Hot tier (0-14 days)
│ │ ├── invocations/
│ │ │ └── date=YYYY-MM-DD/*.parquet
│ │ ├── outputs/
│ │ │ └── date=YYYY-MM-DD/*.parquet
│ │ └── sessions/
│ │ └── date=YYYY-MM-DD/*.parquet
│ └── archive/ # Cold tier (>14 days)
│ └── ... # Same structure
├── blobs/
│ └── content/ # Content-addressed pool
│ ├── ab/
│ │ └── abc123...def.bin.gz
│ └── ... # 256 subdirs (00-ff)
├── extensions/ # DuckDB extensions
├── sql/ # Custom SQL files
└── config.toml
Components
1. BIRD (Buffer and Invocation Record Database)
Purpose: Storage layer with DuckDB backend
Features:
- DuckDB columnar database for fast analytics
- Parquet files for efficient, lock-free storage and querying
- Content-addressed blob storage with deduplication
- Automatic tiering (hot/warm/cold)
- SQL-queryable schema
Schema:
invocations- Command execution metadataoutputs- Stdout/stderr with content hashessessions- Shell session informationblob_registry- Tracks deduplicated blobs
Convenience Views:
recent_invocations- Commands from last 7 daysinvocations_today- Commands from todayfailed_invocations- Commands with non-zero exit codeinvocations_with_outputs- Joined view with output metadataclients- Aggregated client information
2. shq (Shell Query)
Purpose: CLI and shell integration
Features:
- Automatic command capture via shell hooks
- CLI for common queries (
shq show,shq history,shq sql) - Direct SQL access with full DuckDB power
- Data lifecycle management (
shq archive,shq compact)
Commands:
shq init # Initialize BIRD database
shq run CMD # Run and capture command with output
shq save # Manually save from pipes (used by shell hooks)
shq show # Show output from the last command
shq show -O # Show only stdout
shq show -E # Show only stderr
shq show --head 20 # Show first 20 lines
shq history # Browse command history
shq sql "QUERY" # Execute SQL query
shq stats # Show statistics
shq archive # Move old data to archive tier
shq compact # Compact parquet files for better performance
shq hook init # Generate shell integration code
Shell Functions (provided by hook init):
shqr CMD # Run command with full output capture, displaying in real-time
3. Content-Addressed Storage
Purpose: Deduplicate identical outputs
How it works:
- Hash output with BLAKE3
- Check if blob exists (by hash)
- If exists: increment reference count (DEDUP HIT!)
- If not: write new blob (DEDUP MISS)
Benefits:
- 70-90% storage savings for CI/CD workloads
- Automatic (no configuration)
- Fast (2-3ms overhead per command)
- Transparent to queries
4. duck_hunt Integration (Coming in v0.6)
duck_hunt is a separate DuckDB extension for parsing structured data from logs. It supports 90+ formats and provides retrospective extraction of errors and warnings without the noise of raw CLI output.
Formats to be supported:
- Compilers: GCC, Clang, MSVC, rustc
- Linters: ESLint, pylint, shellcheck
- Test runners: pytest, jest, go test
- Build tools: Make, Cargo, npm
Note: This integration is planned for v0.6. Currently outputs are stored as raw content.
Use Cases
1. Debugging Failed Builds
# What failed? Show output from last command
shq show
# Show only stderr
shq show -E
# Has this failed before?
shq sql "
SELECT timestamp, cmd, exit_code
FROM invocations
WHERE cmd LIKE '%make test%'
ORDER BY timestamp DESC
LIMIT 10
"
2. Finding Flaky Tests
shq sql "
SELECT
cmd,
COUNT(*) as runs,
SUM(CASE WHEN exit_code = 0 THEN 1 ELSE 0 END) as passes,
SUM(CASE WHEN exit_code != 0 THEN 1 ELSE 0 END) as failures,
ROUND(100.0 * SUM(CASE WHEN exit_code = 0 THEN 1 ELSE 0 END) / COUNT(*), 1) as pass_rate
FROM invocations
WHERE cmd LIKE '%test%'
GROUP BY cmd
HAVING failures > 0
ORDER BY pass_rate ASC
"
3. Performance Analysis
# Slowest commands today
shq sql "
SELECT cmd, duration_ms, timestamp
FROM invocations_today
WHERE duration_ms IS NOT NULL
ORDER BY duration_ms DESC
LIMIT 10
"
# Build time trends
shq sql "
SELECT
date,
AVG(duration_ms) as avg_duration,
MIN(duration_ms) as min_duration,
MAX(duration_ms) as max_duration
FROM invocations
WHERE cmd LIKE '%make%'
GROUP BY date
ORDER BY date DESC
"
4. CI/CD Integration
# In your CI pipeline
shq run make test || {
echo "Build failed!"
shq show > build-output.txt
shq show -E > build-stderr.txt
shq sql "SELECT * FROM failed_invocations ORDER BY timestamp DESC LIMIT 5"
exit 1
}
5. Data Lifecycle Management
# Archive old data (moves from recent/ to archive/)
shq archive # Archive data older than 14 days
shq archive --days 30 # Archive data older than 30 days
shq archive --dry-run # Preview what would be archived
# Compact parquet files (merges many small files into fewer large ones)
shq compact # Compact all sessions
shq compact -s $SESSION_ID # Compact specific session only
shq compact --today # Only compact today's partition
shq compact --dry-run # Preview what would be compacted
Note: Shell hooks automatically run shq compact -s $session --today -q in the background after each command to keep file counts manageable.
6. Storage Management
# Check storage savings
shq sql "
SELECT
storage_tier,
COUNT(*) as num_blobs,
SUM(byte_length) / 1024 / 1024 as total_mb,
SUM(ref_count) as total_references,
ROUND(AVG(ref_count), 2) as avg_refs_per_blob
FROM blob_registry
GROUP BY storage_tier
"
# Find most-reused blobs
shq sql "
SELECT
content_hash,
ref_count,
byte_length / 1024 as size_kb,
storage_path
FROM blob_registry
ORDER BY ref_count DESC
LIMIT 10
"
Configuration
config.toml
[storage]
bird_root = "~/.local/share/bird"
hot_days = 14 # Days before archiving
inline_threshold = 1048576 # 1MB - inline vs blob
[capture]
auto_capture = true
exclude_patterns = [
"^cd ",
"^ls ",
"^pwd$"
]
[parsing]
duck_hunt_enabled = true
default_format = "auto" # auto-detect format
[performance]
max_output_size = 52428800 # 50MB max per output
Performance
Storage Savings
| Workload | Commands/Day | Dedup Ratio | Before | After | Savings |
|---|---|---|---|---|---|
| CI Tests | 10,000 | 80% | 60GB | 12GB | 48GB (80%) |
| Builds | 1,000 | 70% | 150GB | 45GB | 105GB (70%) |
| Local Dev | 5,000 | 60% | 15GB | 6GB | 9GB (60%) |
Query Performance
- Recent commands: <10ms
- Historical queries: <100ms
- Full-text search: <500ms
- Storage overhead: <3ms per command
Overhead
- Hash computation: 1.7ms (5MB @ 3GB/s)
- Dedup check: 0.5ms (indexed lookup)
- Total per command: 2.7ms
Integration with Existing Tools
blq (Build Log Query)
blq is a separate tool for build log analysis that uses the BIRD schema:
# Use blq for advanced log analysis
make 2>&1 | blq parse --format gcc | shq import
# Or use shq's built-in parsing
shq run --format gcc make
DuckDB CLI
# Direct DuckDB access
duckdb ~/.local/share/bird/db/bird.duckdb
# Use BIRD's views and macros
> SELECT * FROM recent_invocations LIMIT 10;
> SELECT * FROM invocations_today LIMIT 10;
tmux/screen
# Capture in background pane
tmux new-session -d 'shq run make watch'
# Query while it runs
shq sql "SELECT COUNT(*) FROM invocations_today"
Development
Building from Source
git clone https://github.com/yourorg/magic
cd magic
# Build shq
cargo build --release
# Run tests
cargo test
# Install locally
cargo install --path .
Project Structure
magic/
├── shq/ # CLI and capture logic
│ ├── src/
│ │ ├── capture.rs # Command capture
│ │ ├── parse.rs # Output parsing
│ │ ├── query.rs # SQL queries
│ │ └── main.rs # CLI
│ └── tests/
├── bird/ # Storage layer (library)
│ ├── src/
│ │ ├── schema.rs # DuckDB schema
│ │ ├── storage.rs # Blob storage
│ │ └── lib.rs
│ └── tests/
├── docs/ # Documentation
│ ├── bird_spec.md
│ ├── shq_implementation.md
│ └── ...
└── README.md # This file
Documentation
- bird_spec.md - Complete BIRD specification
- shq_implementation.md - shq implementation guide
- shq_shell_integration.md - Shell hook details
- CONTENT_ADDRESSED_BLOBS.md - Storage design
- IMPLEMENTATION_GUIDE.md - Step-by-step guide
FAQ
Q: How much storage does MAGIC use?
A: With content-addressed storage, typically 70-90% less than storing each output separately. For a typical developer:
- Without dedup: ~60GB/month
- With dedup: ~12GB/month
- Savings: 48GB (80%)
Q: Does automatic capture slow down my shell?
A: No. Capture happens asynchronously after command completion. Overhead is <3ms per command.
Q: Can I disable capture for specific commands?
A: Yes. Add patterns to config.toml:
[capture]
exclude_patterns = ["^cd ", "^ls ", "^pwd$"]
Or prefix with space ( command) or backslash (\command) to skip capture.
Q: How do I query old data?
A: All data remains queryable regardless of tier:
-- Works seamlessly across hot/warm/cold
SELECT * FROM invocations WHERE date >= '2025-01-01';
Q: What if two processes write the same blob?
A: Atomic rename handles race conditions. Both processes end up using the same blob (by hash). No data loss.
Q: Can I use this in CI/CD?
A: Yes! Common pattern:
shq run make test || {
shq show > build-output.txt
shq show -E > build-stderr.txt
exit 1
}
Q: How do I clean up old data?
A: Use the archive and compact commands:
# Archive data older than 14 days (default)
shq archive
# Archive data older than 30 days
shq archive --days 30
# Compact parquet files (reduces file count, improves query performance)
shq compact
# Dry-run to see what would happen
shq archive --dry-run
shq compact --dry-run
Q: Does this work with multiple machines?
A: Yes! Each machine writes to its own client_id. Query across machines:
SELECT client_id, COUNT(*)
FROM invocations
GROUP BY client_id;
Roadmap
- v0.1 - Basic capture + DuckDB storage
- v0.2 - Content-addressed blobs + deduplication
- v0.3 - Core shq commands (run, save, show, history, sql, stats)
- v0.4 - Shell integration (zsh, bash hooks)
- v0.5 - Archive and compact commands
- v0.6 - duck_hunt integration for parsing and searching
- v0.7 - TUI history browser (or integration with existing tools)
- v0.8 - tmux integration
- v1.0 - Production ready
- Comprehensive test coverage
- Performance benchmarks
- Migration tools
- Documentation complete
- v2.0 - Advanced features
- Cross-machine deduplication
- Real-time collaboration
- ML-based anomaly detection
Future Directions
- COBE (Claude Output Buffer Extractor): Import Claude Code command history (stored in SQLite) into BIRD for unified querying across human and AI-driven shell sessions.
Contributing
Contributions welcome! Please see CONTRIBUTING.md.
Key Areas
- Performance optimizations
- Documentation improvements
- Bug reports and fixes
License
MIT License - see LICENSE
Credits
- DuckDB - Fast analytical database
- duck_hunt - Log parsing DuckDB extension
- BLAKE3 - Fast cryptographic hashing
- Parquet - Efficient columnar storage
Support
- GitHub Issues: https://github.com/yourorg/magic/issues
- Discussions: https://github.com/yourorg/magic/discussions
Built for developers who want to understand their build workflows
Dependencies
~30–48MB
~618K SLoC