1 unstable release
| new 0.1.0 | Feb 3, 2026 |
|---|
#148 in #data-fusion
37KB
272 lines
SQL Stream
A CLI tool for executing SQL queries against CSV/JSON files using a zero-copy, streaming architecture powered by Apache DataFusion and Apache Arrow.
Installation
From Cargo (Recommended)
cargo install sql-stream
From GitHub Releases
Download pre-built binaries for your platform from the releases page:
Linux (x86_64)
wget https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-linux-x86_64
chmod +x sql-stream-linux-x86_64
sudo mv sql-stream-linux-x86_64 /usr/local/bin/sql-stream
macOS (Intel)
curl -L https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-macos-x86_64 -o sql-stream
chmod +x sql-stream
sudo mv sql-stream /usr/local/bin/
macOS (Apple Silicon)
curl -L https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-macos-aarch64 -o sql-stream
chmod +x sql-stream
sudo mv sql-stream /usr/local/bin/
Windows
- Download
sql-stream-windows-x86_64.exefrom the releases page - Rename to
sql-stream.exeand add to your PATH
From Source
git clone https://github.com/1cbyc/sql-stream.git
cd sql-stream
cargo build --release
The binary will be available at target/release/sql-stream.
Usage
Basic Query on CSV
sql-stream -f data.csv -q "SELECT * FROM data LIMIT 10"
Custom Table Name
sql-stream -f employees.csv -t employees -q "SELECT name, salary FROM employees WHERE age > 30"
JSON Files
sql-stream -f data.json -q "SELECT COUNT(*) as total FROM data"
Aggregations and Group By
sql-stream -f sales.csv -q "SELECT region, SUM(revenue) as total_revenue FROM data GROUP BY region ORDER BY total_revenue DESC"
Enable Verbose Logging
sql-stream -f data.csv -q "SELECT * FROM data" --verbose
Command Line Options
sql-stream -f <FILE> -q <SQL> [OPTIONS]
Options:
-f, --file <FILE> Path to CSV or JSON file (required)
-q, --query <SQL> SQL query to execute (required)
-t, --table-name <NAME> Table name for SQL queries (default: "data")
-v, --verbose Enable verbose debug logging
-h, --help Print help information
-V, --version Print version information
Examples
Data Analysis
# Find top 5 highest salaries
sql-stream -f employees.csv -q "SELECT name, salary FROM data ORDER BY salary DESC LIMIT 5"
# Calculate average age by city
sql-stream -f employees.csv -q "SELECT city, AVG(age) as avg_age FROM data GROUP BY city"
# Filter and count
sql-stream -f sales.json -q "SELECT product, COUNT(*) as count FROM data WHERE revenue > 1000 GROUP BY product"
Complex Queries
# Multiple aggregations
sql-stream -f data.csv -q "
SELECT
category,
COUNT(*) as count,
AVG(price) as avg_price,
MAX(price) as max_price,
MIN(price) as min_price
FROM data
GROUP BY category
HAVING count > 10
ORDER BY avg_price DESC
"
Architecture
SQL Stream is built with a modular architecture:
- Engine Module (
src/engine.rs): DataFusion SessionContext management and query execution - CLI Module (
src/cli.rs): Argument parsing with clap - Error Module (
src/error.rs): Type-safe error handling with thiserror - Main Binary (
src/main.rs): Async runtime and orchestration
Technology Stack
- Apache DataFusion (45.x): SQL query engine
- Apache Arrow (53.x): In-memory columnar format
- Tokio: Async runtime
- Clap (v4): CLI parsing
- Tracing: Structured logging
Performance Considerations
- Zero-Copy Streaming: DataFusion processes data using Arrow's columnar format without unnecessary copying
- Lazy Evaluation: Queries are optimized before execution
- Parallel Processing: Multi-threaded execution for CPU-intensive operations
- Memory Efficiency: Streaming results prevent loading entire datasets into memory
For performance profiling, see docs/PROFILING.md.
Development
Prerequisites
- Rust 1.70 or later
- Cargo
Building
# Debug build
cargo build
# Release build
cargo build --release
Testing
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_csv_simple_select
Code Quality
# Format code
cargo fmt
# Run linter
cargo clippy -- -D warnings
# Generate documentation
cargo doc --open
CI/CD
The project includes GitHub Actions workflows:
-
CI Workflow: Runs on every push and pull request
- Format checking (
cargo fmt) - Linting (
cargo clippy) - Cross-platform tests (Linux, macOS, Windows)
- Format checking (
-
Release Workflow: Automated releases on version tags
- Builds binaries for Linux (x86_64, musl), Windows (x86_64), macOS (Intel, Apple Silicon)
- Uploads release artifacts to GitHub
- Publishes to crates.io
Using as a Library
SQL Stream can be used as a Rust library in your projects:
[dependencies]
sql-stream = "0.1"
use sql_stream::{QueryEngine, SqlStreamError};
#[tokio::main]
async fn main() -> Result<(), SqlStreamError> {
let mut engine = QueryEngine::new()?;
// Register a CSV file as a table
engine.register_file("data.csv", "my_table").await?;
// Execute a SQL query
let results = engine.execute_query("SELECT * FROM my_table WHERE age > 30").await?;
// Print results
engine.print_results(results).await?;
Ok(())
}
See the API documentation for more details.
Contributing
Contributions are welcome! Please ensure:
- All tests pass (
cargo test) - Code is formatted (
cargo fmt) - No clippy warnings (
cargo clippy) - Documentation is updated
License
This project is dual-licensed under MIT OR Apache-2.0.
Acknowledgments
Built with:
Dependencies
~86MB
~1.5M SLoC