3 releases
| 0.1.0-beta.1 | Jan 1, 2026 |
|---|---|
| 0.1.0-alpha.2 | Dec 23, 2025 |
| 0.1.0-alpha.1 | Oct 13, 2025 |
#7 in #scikit-learn
7MB
154K
SLoC
Sklears Python Bindings
Python bindings for the sklears machine learning library, providing a high-performance, scikit-learn compatible interface through PyO3.
Latest release:
0.1.0-beta.1(January 1, 2026). See the workspace release notes for highlights and upgrade guidance.
Features
- Drop-in replacement for scikit-learn's most common algorithms
- 14-20x performance improvements (validated) over scikit-learn
- Full NumPy array compatibility with zero-copy operations where possible
- Comprehensive error handling with Python exceptions
- Memory-safe operations with automatic reference counting
- Scikit-learn compatible API for easy migration
Supported Algorithms
Linear Models
LinearRegression- Ordinary least squares linear regressionRidge- Ridge regression with L2 regularizationLasso- Lasso regression with L1 regularizationLogisticRegression- Logistic regression for classification
Clustering
KMeans- K-Means clustering algorithmDBSCAN- Density-based spatial clustering
Preprocessing
StandardScaler- Standardize features by removing mean and scaling to unit varianceMinMaxScaler- Scale features to a given rangeLabelEncoder- Encode target labels with value between 0 and n_classes-1
Model Selection
train_test_split- Split arrays into random train and test subsetsKFold- K-Fold cross-validatorStratifiedKFold- Stratified K-Fold cross-validatorcross_val_score- Evaluate metric(s) by cross-validationcross_val_predict- Generate cross-validated estimates
Metrics
accuracy_score- Classification accuracymean_squared_error- Mean squared error for regressionmean_absolute_error- Mean absolute error for regressionr2_score- R² (coefficient of determination) scoreprecision_score- Precision for classificationrecall_score- Recall for classificationf1_score- F1 score for classificationconfusion_matrix- Confusion matrix for classificationclassification_report- Text report of classification metrics
Installation
Prerequisites
- Python 3.8 or later
- NumPy
- Rust 1.70 or later
- PyO3 and Maturin for building
Building from Source
-
Clone the repository:
git clone https://github.com/cool-japan/sklears.git cd sklears/crates/sklears-python -
Install Maturin:
pip install maturin -
Build and install the package:
maturin develop --release -
Or build a wheel:
maturin build --release pip install target/wheels/sklears_python-*.whl
Quick Start
import numpy as np
import sklears_python as skl
# Generate sample data
X = np.random.randn(100, 4)
y = np.random.randn(100)
# Train a linear regression model
model = skl.LinearRegression()
model.fit(X, y)
predictions = model.predict(X)
# Calculate R² score
score = model.score(X, y)
print(f"R² score: {score:.3f}")
Performance Comparison
Here's a typical performance comparison with scikit-learn:
import time
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import sklears_python as skl
from sklearn.linear_model import LinearRegression as SklearnLR
# Generate data
X, y = make_regression(n_samples=10000, n_features=100, noise=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Sklears
start = time.time()
model = skl.LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
sklears_time = time.time() - start
# Scikit-learn
start = time.time()
sklearn_model = SklearnLR()
sklearn_model.fit(X_train, y_train)
sklearn_predictions = sklearn_model.predict(X_test)
sklearn_time = time.time() - start
print(f"Sklears time: {sklears_time:.4f}s")
print(f"Sklearn time: {sklearn_time:.4f}s")
print(f"Speedup: {sklearn_time / sklears_time:.2f}x")
API Compatibility
The sklears Python bindings are designed to be API-compatible with scikit-learn. Most existing scikit-learn code should work with minimal changes:
Before (scikit-learn):
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
After (sklears):
import sklears_python as skl
# All functions and classes are available in the main module
model = skl.LinearRegression()
scaler = skl.StandardScaler()
X_train, X_test, y_train, y_test = skl.train_test_split(X, y)
mse = skl.mean_squared_error(y_true, y_pred)
Memory Management
The bindings are designed to be memory-efficient:
- Zero-copy operations where possible using NumPy's C API
- Automatic memory management through PyO3's reference counting
- Efficient data structures using ndarray and sprs for sparse matrices
- Streaming support for large datasets that don't fit in memory
Error Handling
All Rust errors are properly converted to Python exceptions:
import sklears_python as skl
import numpy as np
try:
# This will raise a ValueError if arrays have incompatible shapes
model = skl.LinearRegression()
model.fit(np.array([[1, 2], [3, 4]]), np.array([1, 2, 3])) # Shape mismatch
except ValueError as e:
print(f"Error: {e}")
System Information
Get information about your sklears installation:
import sklears_python as skl
# Version information
print(f"Version: {skl.get_version()}")
# Build information
build_info = skl.get_build_info()
for key, value in build_info.items():
print(f"{key}: {value}")
# Hardware capabilities
hardware_info = skl.get_hardware_info()
print("Hardware support:")
for feature, supported in hardware_info.items():
print(f" {feature}: {supported}")
# Performance benchmarks
benchmarks = skl.benchmark_basic_operations()
print("Performance benchmarks:")
for operation, time_ms in benchmarks.items():
print(f" {operation}: {time_ms:.2f} ms")
Configuration
Set global configuration options:
import sklears_python as skl
# Set number of threads for parallel operations
skl.set_config("n_jobs", "4")
# Get current configuration
config = skl.get_config()
print(config)
Examples
See the examples/ directory for comprehensive usage examples:
python_demo.py- Complete demonstration of all features- Performance comparison scripts
- Real-world use cases
Contributing
Contributions are welcome! Please see the main sklears repository for contribution guidelines.
License
This project is licensed under the MIT OR Apache-2.0 license.
Acknowledgments
- Built with PyO3 for Rust-Python interoperability
- Compatible with NumPy arrays
- API inspired by scikit-learn
Dependencies
~48–66MB
~1M SLoC