Readme
SQL Stream
A CLI tool for executing SQL queries against CSV/JSON files using a zero-copy, streaming architecture powered by Apache DataFusion and Apache Arrow.
Installation
From Cargo (Recommended)
cargo install sql-stream
From GitHub Releases
Download pre-built binaries for your platform from the releases page :
Linux (x86_64)
wget https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-linux-x86_64
chmod +x sql-stream-linux-x86_64
sudo mv sql-stream-linux-x86_64 /usr/local/bin/sql-stream
macOS (Intel)
curl - L https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-macos-x86_64 - o sql-stream
chmod +x sql-stream
sudo mv sql-stream /usr/local/bin/
macOS (Apple Silicon)
curl - L https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-macos-aarch64 - o sql-stream
chmod +x sql-stream
sudo mv sql-stream /usr/local/bin/
Windows
Download sql- stream- windows- x86_64. exe from the releases page
Rename to sql- stream. exe and add to your PATH
From Source
git clone https://github.com/1cbyc/sql-stream.git
cd sql-stream
cargo build -- release
The binary will be available at target/ release/ sql- stream .
Usage
Basic Query on CSV
sql-stream - f data.csv - q " SELECT * FROM data LIMIT 10"
Custom Table Name
sql-stream - f employees.csv - t employees - q " SELECT name, salary FROM employees WHERE age > 30"
JSON Files
sql-stream - f data.json - q " SELECT COUNT(*) as total FROM data"
Aggregations and Group By
sql-stream - f sales.csv - q " SELECT region, SUM(revenue) as total_revenue FROM data GROUP BY region ORDER BY total_revenue DESC"
Enable Verbose Logging
sql-stream - f data.csv - q " SELECT * FROM data" -- verbose
Command Line Options
sql- stream - f < FILE > - q < SQL > [ OPTIONS ]
Options:
- f, - - file < FILE > Path to CSV or JSON file ( required)
- q, - - query < SQL > SQL query to execute ( required)
- t, - - table- name < NAME > Table name for SQL queries ( default: " data" )
- v, - - verbose Enable verbose debug logging
- h, - - help Print help information
- V, - - version Print version information
Examples
Data Analysis
# Find top 5 highest salaries
sql-stream -f employees.csv -q "SELECT name, salary FROM data ORDER BY salary DESC LIMIT 5"
# Calculate average age by city
sql-stream -f employees.csv -q "SELECT city, AVG(age) as avg_age FROM data GROUP BY city"
# Filter and count
sql-stream -f sales.json -q "SELECT product, COUNT(*) as count FROM data WHERE revenue > 1000 GROUP BY product"
Complex Queries
# Multiple aggregations
sql-stream -f data.csv -q "
SELECT
category,
COUNT(*) as count,
AVG(price) as avg_price,
MAX(price) as max_price,
MIN(price) as min_price
FROM data
GROUP BY category
HAVING count > 10
ORDER BY avg_price DESC
"
Architecture
SQL Stream is built with a modular architecture:
Engine Module (src/engine.rs ): DataFusion SessionContext management and query execution
CLI Module (src/cli.rs ): Argument parsing with clap
Error Module (src/error.rs ): Type-safe error handling with thiserror
Main Binary (src/main.rs ): Async runtime and orchestration
Technology Stack
Apache DataFusion (45.x): SQL query engine
Apache Arrow (53.x): In-memory columnar format
Tokio : Async runtime
Clap (v4): CLI parsing
Tracing : Structured logging
Zero-Copy Streaming : DataFusion processes data using Arrow's columnar format without unnecessary copying
Lazy Evaluation : Queries are optimized before execution
Parallel Processing : Multi-threaded execution for CPU-intensive operations
Memory Efficiency : Streaming results prevent loading entire datasets into memory
For performance profiling, see docs/PROFILING.md .
Development
Prerequisites
Building
# Debug build
cargo build
# Release build
cargo build --release
Testing
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_csv_simple_select
Code Quality
# Format code
cargo fmt
# Run linter
cargo clippy -- -D warnings
# Generate documentation
cargo doc --open
CI/CD
The project includes GitHub Actions workflows:
Using as a Library
SQL Stream can be used as a Rust library in your projects:
[ dependencies ]
sql-stream = " 0.1"
use sql_stream:: { QueryEngine, SqlStreamError} ;
# [ tokio ::main ]
async fn main ( ) -> Result < ( ) , SqlStreamError> {
let mut engine = QueryEngine:: new( ) ? ;
// Register a CSV file as a table
engine. register_file ( " data.csv" , " my_table" ) . await? ;
// Execute a SQL query
let results = engine. execute_query ( " SELECT * FROM my_table WHERE age > 30" ) . await? ;
// Print results
engine. print_results ( results) . await? ;
Ok ( ( ) )
}
See the API documentation for more details.
Contributing
Contributions are welcome! Please ensure:
All tests pass (cargo test )
Code is formatted (cargo fmt )
No clippy warnings (cargo clippy )
Documentation is updated
License
This project is dual-licensed under MIT OR Apache-2.0.
Acknowledgments
Built with: