11 unstable releases (3 breaking)

new 0.4.2	Feb 6, 2026
0.4.1	Feb 6, 2026
0.4.0	Jan 25, 2026
0.3.2	Jan 21, 2026
0.1.3	Dec 17, 2025

#480 in Hardware support

Used in ringkernel

Apache-2.0

1.5MB
33K SLoC

ringkernel-metal

Apple Metal backend for RingKernel.

Status

Implemented - Full Metal backend with event-driven execution model.

Overview

This crate provides GPU compute support for RingKernel on Apple platforms using the Metal framework. It targets macOS, iOS, and Apple Silicon devices with unified memory architecture.

Requirements

macOS 10.15+ or iOS 13+
Apple Silicon (M1/M2/M3/M4) or compatible AMD GPU
Rust with metal feature enabled

Features

Event-driven kernel execution via Metal compute shaders
MSL (Metal Shading Language) runtime compilation
Unified memory architecture optimization for Apple Silicon
K2K (Kernel-to-Kernel) messaging via inbox/routing system
Halo exchange for stencil computations (2D/3D grids)
HLC (Hybrid Logical Clock) support for causal ordering
Full RingKernelRuntime trait implementation

Limitations

Metal does not support CUDA-style cooperative groups, so persistent kernels use event-driven execution with host-side dispatch loops. This provides equivalent functionality with different performance characteristics:

Aspect	CUDA Persistent	Metal Event-Driven
Kernel lifetime	Long-running	Per-message dispatch
Grid synchronization	`grid.sync()`	Host-driven barriers
Command latency	~0.03µs (mapped memory)	~10-50µs (dispatch)
Best for	Interactive commands	Batch compute

Usage

use ringkernel_metal::{MetalRuntime, is_metal_available};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    if !is_metal_available() {
        eprintln!("Metal not available");
        return Ok(());
    }

    // Create runtime
    let runtime = MetalRuntime::new().await?;

    // Launch kernel with options
    let kernel = runtime.launch("compute", Default::default()).await?;
    kernel.activate().await?;

    // Send messages (triggers compute dispatch)
    kernel.send_envelope(envelope).await?;

    // Receive results
    let response = kernel.receive_timeout(Duration::from_secs(1)).await?;

    // Clean shutdown
    kernel.terminate().await?;
    runtime.shutdown().await?;
    Ok(())
}

Architecture

Core Components

MetalDevice - Wrapper around metal::Device with capability queries
MetalBuffer - GPU buffer with StorageModeShared for unified memory
MetalKernel - Full KernelHandleInner implementation with:
- MSL shader compilation via new_library_with_source
- Compute pipeline creation
- Event-driven dispatch on message send
- Control block for lifecycle management
MetalRuntime - RingKernelRuntime implementation with K2K broker

K2K Messaging

Kernel-to-kernel messaging uses a routing table and inbox system:

// K2K structures
MetalK2KInboxHeader  // 64 bytes - inbox state with lock/sequence
MetalK2KRouteEntry   // 32 bytes - route to neighbor
MetalK2KRoutingTable // Routing with 2D (4-neighbor) or 3D (6-neighbor) support

Halo Exchange

For stencil computations, the MetalHaloExchange manager handles:

use ringkernel_metal::{MetalHaloExchange, HaloExchangeConfig};

// Create 2D grid with halo exchange
let config = HaloExchangeConfig::new_2d(
    8, 8,   // grid: 8x8 tiles
    64, 64, // tile: 64x64 cells
    1,      // halo: 1 cell
);
let mut exchange = MetalHaloExchange::new(config);

// Initialize on device
exchange.initialize(&device)?;

// Perform exchange cycle
exchange.exchange(&tile_buffers)?;

MSL Templates

The crate provides MSL templates for common patterns:

RING_KERNEL_MSL_TEMPLATE - Base ring kernel with control block, message queues, and HLC
K2K_HALO_EXCHANGE_MSL_TEMPLATE - K2K communication for stencil computations

Platform Support

Platform	Status
macOS (Apple Silicon)	✅ Full support
macOS (Intel + AMD)	✅ Full support
iOS	✅ Supported (untested)
Linux/Windows	❌ Stub only

On non-macOS platforms, MetalRuntime::new() returns BackendUnavailable.

License

Apache-2.0

Dependencies

~11–15MB
~176K SLoC