kerno

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2026 License: Apache-2.0

README

KERNO

Production broke? One command. Root cause. 30 seconds.

eBPF-powered incident diagnosis for Kubernetes & Linux - in plain English. No PhD required.

CI Go Report Card Release Go Version

Quick Start · How It Works · Features · Kubernetes · Docs


What is Kerno?

Imagine your Linux server is feeling sick. Something is slow, something is breaking, but you don't know what. You check CPU - looks fine. Memory - fine. Your dashboards are green. But users are complaining.

That's because your dashboards live at the application layer. The real problem is happening deep inside the kernel - and nothing is looking there.

Kerno looks there.

It watches the Linux kernel in real time using eBPF - the same technology Netflix and Meta use to diagnose their servers - and tells you exactly what's wrong in plain English. No PhD required.

sudo kerno doctor

That's it. One command. 30 seconds later, you get a diagnostic report that reads like a doctor's note:

╔═══════════════════════════════════════════════════════════╗
║                     KERNO DOCTOR                          ║
║          Kernel Diagnostic Report                         ║
╚═══════════════════════════════════════════════════════════╝

Host:     prod-db-01
Kernel:   6.8.0-generic

────────────────────────────────────────────────────────────
 FINDINGS  (2 critical · 1 warning · 0 info)
────────────────────────────────────────────────────────────

 !!  CRITICAL  TCP Retransmit Storm
     ──────────────────────────────
     Signal:   retransmit rate=12.3% (threshold: 2.0%), 847 retransmits
     Cause:    Network path degradation causing excessive retransmissions
     Impact:   Every connection risks latency spikes
     Fix:      ethtool -S eth0 | grep -i error
               ping -c 100 <gateway>

 !!  CRITICAL  Disk I/O Bottleneck Detected
     ─────────────────────────────────────
     Signal:   sync P99=280ms (threshold: 200ms), 3,241 sync ops
     Cause:    Storage device is saturated - fsync operations blocking
     Impact:   Database writes and file syncs are delayed
     Fix:      iostat -x 1 5
               Consider faster storage or write batching

 !   WARNING   CPU Scheduler Contention
     ──────────────────────────────────
     Signal:   runqueue P99=18ms (warning: 5ms)
     Cause:    Processes waiting in the CPU run queue
     Fix:      top -H
               Reduce worker threads or increase CPU count

────────────────────────────────────────────────────────────
 RECOMMENDED ACTION ORDER
────────────────────────────────────────────────────────────

  1. [NOW]     TCP Retransmit Storm
  2. [NOW]     Disk I/O Bottleneck Detected
  3. [5 MIN]   CPU Scheduler Contention

════════════════════════════════════════════════════════════

Why Kerno Exists

The kernel knows before anyone else. When disk gets slow, when the network starts dropping packets, when memory is about to run out - the kernel sees it first. Minutes before your dashboards. Hours before your users.

Every observability tool you already use watches your application. Kerno watches the kernel - the layer underneath everything.

flowchart TB
    subgraph Stack["YOUR STACK"]
        App["Your Application<br/>(Node, Python, Go, Java)"]
        OS["Operating System"]
        HW["Hardware"]
    end

    subgraph Tools["WHO WATCHES WHAT"]
        APM["Datadog, New Relic<br/>Prometheus, Grafana"]
        Kerno["<b>KERNO</b><br/><i>eBPF kernel tracing</i>"]
        Bare["(nobody)"]
    end

    App -.watched by.-> APM
    OS -.watched by.-> Kerno
    HW -.watched by.-> Bare

    style Kerno fill:#e94560,stroke:#fff,color:#fff,stroke-width:3px
    style App fill:#0f3460,stroke:#16213e,color:#fff
    style OS fill:#1a1a2e,stroke:#e94560,color:#fff
    style HW fill:#533483,stroke:#16213e,color:#fff
    style APM fill:#16213e,stroke:#0f3460,color:#ccc
    style Bare fill:#16213e,stroke:#0f3460,color:#888
How Kerno compares
Watches K8s Required SLO Mapping AI Analysis Install Time
Prometheus Application No No No Hours
Datadog APM Application No Partial Yes Hours
Inspektor Gadget Container Yes No No Minutes
Kerno Kernel No Yes Yes 30 seconds

Features

Diagnostics
  • kerno doctor - 30-second full diagnostic with ranked findings and fix suggestions
  • kerno explain - AI-powered kernel error explanation (no root needed)
  • kerno predict - Predict failures before they happen via trend analysis
Real-Time Tracing
  • kerno trace syscall - Per-process syscall latency streaming
  • kerno trace disk - Block I/O latency by device, operation, process
  • kerno trace sched - CPU scheduler run queue delays
Continuous Monitoring
  • kerno watch tcp - TCP connections, RTT, retransmits
  • kerno watch oom - OOM kill alerts with full context
  • kerno watch fd - FD leak detection via growth rate
  • kerno start - Daemon mode with Prometheus metrics
Integrations
  • Prometheus - 16 metrics at /metrics
  • Kubernetes - Helm chart + pod enrichment
  • AI Providers - Anthropic, OpenAI, Ollama (optional)
  • Systemd - Unit/slice enrichment on bare metal

Quick Start

Prerequisites
  • Linux kernel >= 5.8 with BTF support (check: ls /sys/kernel/btf/vmlinux)
  • Root access (or CAP_BPF + CAP_PERFMON)
One-liner install (any Linux)
curl -sfL https://raw.githubusercontent.com/lowplane/kerno/main/scripts/install.sh | sudo bash
sudo kerno doctor
Or run as a daemon (bare metal / VMs)
curl -sfL https://raw.githubusercontent.com/lowplane/kerno/main/scripts/install.sh | sudo bash -s -- --daemon
# Kerno is now running as a systemd service with Prometheus metrics on :9090
sudo kerno doctor          # One-shot diagnosis
journalctl -u kerno -f     # Stream logs
curl localhost:9090/metrics # Scrape with Prometheus
Or deploy on Kubernetes
helm install kerno ./deploy/helm/kerno \
  -n kerno-system --create-namespace
Or build from source
git clone https://github.com/lowplane/kerno.git
cd kerno && make build
sudo ./bin/kerno doctor
Or run it in Docker
docker run --privileged --pid=host \
  -v /sys/kernel/debug:/sys/kernel/debug:ro \
  -v /sys/fs/bpf:/sys/fs/bpf \
  -v /proc:/proc:ro \
  ghcr.io/lowplane/kerno:latest doctor

How It Works

Kerno runs as a lightweight agent. When you run kerno doctor, it loads six tiny eBPF programs into the kernel, collects 30 seconds of real data, runs it through 11 diagnostic rules, and gives you a ranked report. No sampling. No guesswork. Actual kernel data.

The Architecture
flowchart TB
    subgraph Kernel["KERNEL SPACE · eBPF Programs"]
        direction LR
        P1["syscall<br/>latency"]
        P2["tcp<br/>monitor"]
        P3["oom<br/>track"]
        P4["disk<br/>io"]
        P5["sched<br/>delay"]
        P6["fd<br/>track"]
    end

    RB[("Ring Buffers<br/>256KB per program<br/>zero-copy mmap")]

    subgraph UserSpace["USER SPACE · Go"]
        direction TB
        Loader["BPF Loaders<br/>cilium/ebpf"]
        Collector["Collectors<br/>percentile aggregation"]
        Signals[("Signals Snapshot<br/>single source of truth")]
        Adapter["Environment Adapter<br/>bare metal · systemd · k8s"]
    end

    subgraph Outputs["OUTPUTS"]
        direction TB
        Doctor["Doctor Engine<br/>11 diagnostic rules"]
        AI["AI Layer <i>(optional)</i><br/>root cause analysis"]
        Prom["Prometheus<br/>/metrics :9090"]
        CLI["Terminal<br/>pretty · JSON"]
    end

    P1 & P2 & P3 & P4 & P5 & P6 --> RB
    RB --> Loader
    Loader --> Collector
    Collector --> Signals
    Adapter -.enriches.-> Signals
    Signals --> Doctor
    Signals --> Prom
    Doctor --> AI
    AI --> CLI
    Doctor --> CLI

    classDef kernel fill:#1a1a2e,stroke:#e94560,color:#fff,stroke-width:2px
    classDef user fill:#0f3460,stroke:#16213e,color:#fff,stroke-width:2px
    classDef output fill:#16213e,stroke:#533483,color:#fff,stroke-width:2px
    classDef buffer fill:#533483,stroke:#e94560,color:#fff,stroke-width:3px
    classDef ai fill:#e94560,stroke:#fff,color:#fff,stroke-width:2px

    class P1,P2,P3,P4,P5,P6 kernel
    class Loader,Collector,Signals,Adapter user
    class Doctor,Prom,CLI output
    class RB buffer
    class AI ai
The Data Flow
sequenceDiagram
    participant K as Kernel<br/>(eBPF)
    participant R as Ring Buffer
    participant C as Collectors
    participant D as Doctor Engine
    participant A as AI Layer
    participant U as You

    K->>R: syscall/tcp/oom/io events
    Note over K,R: Zero-copy, microsecond overhead
    R->>C: drain events
    C->>C: aggregate into p50/p95/p99
    C->>D: Signals snapshot
    D->>D: evaluate 11 rules
    alt AI enabled
        D->>A: findings + signals
        A->>A: correlate + explain
        A->>U: enhanced report
    else AI disabled
        D->>U: deterministic report
    end

The Diagnostic Rules

Kerno runs 11 deterministic rules against every snapshot. Every rule is explainable, configurable, and tested.

# Rule Triggers When Severity
1 Disk I/O Bottleneck fsync p99 > 50ms or write p99 > 200ms WARN / CRIT
2 OOM Kill Occurred Any OOM event in window CRIT
3 TCP Retransmit Storm Retransmit rate > 2% CRIT
4 TCP RTT Degradation RTT p99 > 10ms WARN
5 Scheduler Contention Runqueue delay p99 > 5ms WARN / CRIT
6 FD Leak FD growth > 10/sec sustained WARN (with ETA)
7 Syscall Latency High Any syscall p99 > 100ms WARN / CRIT
8 OOM Imminent Memory > 90% + positive growth WARN / CRIT (with ETA)
9 Syscall Error Rate Error rate > 1% per syscall WARN / CRIT
10 Memory Pressure RSS usage > 90% WARN
11 Network Latency Connection RTT > 100ms WARN

Usage

Diagnostics - "What's wrong right now?"
# The golden command: 30-second full diagnostic
sudo kerno doctor

# Quick 10-second check
sudo kerno doctor --duration 10s

# JSON output for CI/CD (exits non-zero on critical findings)
sudo kerno doctor --output json --exit-code

# With AI-powered analysis
export KERNO_AI_API_KEY="sk-..."
sudo kerno doctor --ai

# Explain a kernel error (no root needed)
kerno explain "BUG: kernel NULL pointer dereference"
dmesg | tail -5 | kerno explain

# Predict failures before they happen
sudo kerno predict --snapshots 5 --interval 15s
Real-Time Tracing - "Watch it happen"
# Stream every syscall event
sudo kerno trace syscall

# Only syscalls from PID 1234
sudo kerno trace syscall --pid 1234

# Only read() syscalls, as JSON
sudo kerno trace syscall --filter read --output json

# Top 10 syscalls by p99 latency (updates every second)
sudo kerno trace syscall --top 10

# Disk I/O for postgres, writes over 5ms only
sudo kerno trace disk --process postgres --op write --threshold 5ms

# Scheduler delays > 10ms
sudo kerno trace sched --threshold 10ms
Continuous Monitoring - "Let me know when..."
# Watch TCP for any connections with retransmits
sudo kerno watch tcp --retransmits

# Alert on any OOM kill
sudo kerno watch oom --alert

# Detect processes leaking FDs
sudo kerno watch fd --threshold 10

# Run as a daemon (Prometheus metrics on :9090)
sudo kerno start

Prometheus Metrics

When you run sudo kerno start, Kerno exposes 16 Prometheus metrics at :9090/metrics - scrape it with Prometheus, visualize in Grafana, alert via Alertmanager.

View all 16 metrics
Metric Type What It Measures
kerno_syscall_duration_nanoseconds Summary Syscall latency (p50, p95, p99)
kerno_syscall_total Counter Total syscall events
kerno_tcp_rtt_nanoseconds Summary TCP round-trip time
kerno_tcp_retransmits_total Counter TCP retransmissions
kerno_tcp_connections_total Counter TCP connection events
kerno_oom_kills_total Counter OOM kill events
kerno_disk_io_duration_nanoseconds Summary Disk I/O latency
kerno_disk_io_bytes_total Counter Disk I/O bytes
kerno_sched_delay_nanoseconds Summary CPU run queue delay
kerno_fd_open_total Counter FD open operations
kerno_fd_close_total Counter FD close operations
kerno_collector_events_total Counter Events per collector
kerno_collector_errors_total Counter Errors per collector
kerno_bpf_programs_loaded Gauge Loaded eBPF programs
kerno_info Gauge Build version

Health endpoints: /healthz and /readyz return JSON status.


Kubernetes Deployment

Kerno auto-detects Kubernetes and enriches every event with pod, namespace, node, and deployment labels - no client-go dependency, no heavyweight informers.

flowchart TB
    subgraph Cluster["Kubernetes Cluster"]
        direction TB
        subgraph Node1["Worker Node 1"]
            K1["Kerno Pod<br/><i>DaemonSet</i>"]
            W1["Workload Pods"]
        end
        subgraph Node2["Worker Node 2"]
            K2["Kerno Pod<br/><i>DaemonSet</i>"]
            W2["Workload Pods"]
        end
        subgraph Node3["Worker Node N"]
            K3["Kerno Pod<br/><i>DaemonSet</i>"]
            W3["Workload Pods"]
        end
    end

    K1 -->|:9090/metrics| Prom["Prometheus"]
    K2 -->|:9090/metrics| Prom
    K3 -->|:9090/metrics| Prom
    Prom --> GF["Grafana"]

    K1 -.enriches.-> W1
    K2 -.enriches.-> W2
    K3 -.enriches.-> W3

    style K1 fill:#e94560,stroke:#fff,color:#fff
    style K2 fill:#e94560,stroke:#fff,color:#fff
    style K3 fill:#e94560,stroke:#fff,color:#fff
    style Prom fill:#0f3460,stroke:#fff,color:#fff
    style GF fill:#16213e,stroke:#fff,color:#fff
    style W1 fill:#533483,stroke:#fff,color:#fff
    style W2 fill:#533483,stroke:#fff,color:#fff
    style W3 fill:#533483,stroke:#fff,color:#fff
Install with Helm
helm install kerno ./deploy/helm/kerno \
  -n kerno-system --create-namespace
Customize via values.yaml
image:
  repository: ghcr.io/lowplane/kerno
  tag: latest

resources:
  requests: { cpu: 100m, memory: 128Mi }
  limits:   { cpu: "1",  memory: 512Mi }

prometheus:
  enabled: true
  port: 9090

# Enable Prometheus Operator ServiceMonitor
serviceMonitor:
  enabled: true
  interval: 15s

# Restrict to specific nodes
nodeSelector:
  monitoring: "true"
Verify the deployment
kubectl -n kerno-system get ds kerno
kubectl -n kerno-system logs -l app.kubernetes.io/name=kerno
kubectl -n kerno-system port-forward ds/kerno 9090:9090
curl localhost:9090/metrics
Raw manifests

If you don't use Helm, apply the manifests directly:

kubectl apply -f deploy/k8s/

Environment & AI

Environment auto-detection. On start, Kerno picks one of three adapters and enriches every event - no configuration required:

  • Kubernetes (token present) → pod, namespace, node, deployment
  • Systemd (PID 1 is systemd) → unit, slice, scope
  • Bare metal → hostname, cgroup path

AI (optional). The AI layer runs after the deterministic rule engine - it enriches findings with cross-signal correlation and root cause explanations, never replaces them. Three providers (Anthropic, OpenAI, Ollama for air-gapped), three privacy modes (full / redacted / summary), TTL cache + token-bucket rate limiting, and graceful fallback to a deterministic template on failure. No LLM SDK dependencies - pure net/http.

export KERNO_AI_API_KEY="sk-..."
export KERNO_AI_PROVIDER="anthropic"   # or openai, ollama
sudo kerno doctor --ai

Configuration

Kerno works with zero configuration. For custom setups, create /etc/kerno/config.yaml:

log_level: info
log_format: text

collectors:
  syscall_latency: true
  tcp_monitor: true
  oom_track: true
  disk_io: true
  sched_delay: true
  fd_track: true

doctor:
  duration: 30s
  thresholds:
    syscall_p99_warning_ns:  100000000   # 100ms
    syscall_p99_critical_ns: 500000000   # 500ms
    tcp_retransmit_pct:      2.0         # 2%
    oom_memory_pct:          90.0        # 90%
    disk_p99_warning_ns:     50000000    # 50ms
    disk_p99_critical_ns:    200000000   # 200ms
    sched_delay_warning_ns:  5000000     # 5ms
    sched_delay_critical_ns: 20000000    # 20ms
    fd_growth_per_sec:       10.0

prometheus:
  enabled: true
  addr: ":9090"

ai:
  enabled: false
  provider: anthropic
  privacy_mode: summary

Precedence: CLI flags > environment variables (KERNO_*) > config file > defaults.


Building from Source

# Requirements: Go 1.25+
# Optional for real eBPF: clang 14+, libbpf-dev, llvm

make build          # Build binary (uses BPF stubs - no clang needed)
make bpf            # Compile eBPF C programs
make test           # Run unit tests
make test-race      # Run with race detector
make lint           # golangci-lint
make check          # Full CI check: vet + test + lint
make docker         # Build Docker image

Contributing

Contributions welcome. See CONTRIBUTING.md for:

  • Development setup and prerequisites
  • Commit message conventions (Conventional Commits)
  • Code review process
  • DCO sign-off requirement

For security reports, see SECURITY.md.


License

Apache License 2.0 - see LICENSE.


Kerno is built by Shivam at Lowplane.

If Kerno saved your Sunday, consider leaving a star - it helps other engineers find the project.

Directories

Path Synopsis
cmd
kerno command
Kerno is an eBPF-based kernel observability engine for Linux.
Kerno is an eBPF-based kernel observability engine for Linux.
internal
adapter
Package adapter provides environment adapters that enrich kernel events with context metadata (hostname, K8s pod/namespace, systemd unit, etc.).
Package adapter provides environment adapters that enrich kernel events with context metadata (hostname, K8s pod/namespace, systemd unit, etc.).
ai
Package ai implements LLM provider abstractions and the AI analysis layer for kerno doctor.
Package ai implements LLM provider abstractions and the AI analysis layer for kerno doctor.
bpf
Package bpf provides the eBPF program loaders and Go event types.
Package bpf provides the eBPF program loaders and Go event types.
cli
Package cli defines all Cobra commands for the kerno CLI.
Package cli defines all Cobra commands for the kerno CLI.
collector
Package collector defines the interface and registry for signal collectors.
Package collector defines the interface and registry for signal collectors.
config
Package config defines the global configuration for Kerno.
Package config defines the global configuration for Kerno.
doctor
Package doctor implements the kerno doctor diagnostic engine.
Package doctor implements the kerno doctor diagnostic engine.
metrics
Package metrics defines and registers the Prometheus metrics for Kerno.
Package metrics defines and registers the Prometheus metrics for Kerno.
version
Package version exposes the build metadata injected at compile time via ldflags.
Package version exposes the build metadata injected at compile time via ldflags.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL