kerno

module

v0.1.0 Latest Latest Go to latest Published: Apr 21, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/lowplane/kerno

Links

Open Source Insights

README ¶

KERNO

Production broke? One command. Root cause. 30 seconds.

eBPF-powered incident diagnosis for Kubernetes & Linux - in plain English. No PhD required.

Go Version

Quick Start · How It Works · Features · Kubernetes · Docs

What is Kerno?

Imagine your Linux server is feeling sick. Something is slow, something is breaking, but you don't know what. You check CPU - looks fine. Memory - fine. Your dashboards are green. But users are complaining.

That's because your dashboards live at the application layer. The real problem is happening deep inside the kernel - and nothing is looking there.

Kerno looks there.

It watches the Linux kernel in real time using eBPF - the same technology Netflix and Meta use to diagnose their servers - and tells you exactly what's wrong in plain English. No PhD required.

sudo kerno doctor

That's it. One command. 30 seconds later, you get a diagnostic report that reads like a doctor's note:

╔═══════════════════════════════════════════════════════════╗
║                     KERNO DOCTOR                          ║
║          Kernel Diagnostic Report                         ║
╚═══════════════════════════════════════════════════════════╝

Host:     prod-db-01
Kernel:   6.8.0-generic

────────────────────────────────────────────────────────────
 FINDINGS  (2 critical · 1 warning · 0 info)
────────────────────────────────────────────────────────────

 !!  CRITICAL  TCP Retransmit Storm
     ──────────────────────────────
     Signal:   retransmit rate=12.3% (threshold: 2.0%), 847 retransmits
     Cause:    Network path degradation causing excessive retransmissions
     Impact:   Every connection risks latency spikes
     Fix:      ethtool -S eth0 | grep -i error
               ping -c 100 <gateway>

 !!  CRITICAL  Disk I/O Bottleneck Detected
     ─────────────────────────────────────
     Signal:   sync P99=280ms (threshold: 200ms), 3,241 sync ops
     Cause:    Storage device is saturated - fsync operations blocking
     Impact:   Database writes and file syncs are delayed
     Fix:      iostat -x 1 5
               Consider faster storage or write batching

 !   WARNING   CPU Scheduler Contention
     ──────────────────────────────────
     Signal:   runqueue P99=18ms (warning: 5ms)
     Cause:    Processes waiting in the CPU run queue
     Fix:      top -H
               Reduce worker threads or increase CPU count

────────────────────────────────────────────────────────────
 RECOMMENDED ACTION ORDER
────────────────────────────────────────────────────────────

  1. [NOW]     TCP Retransmit Storm
  2. [NOW]     Disk I/O Bottleneck Detected
  3. [5 MIN]   CPU Scheduler Contention

════════════════════════════════════════════════════════════

Why Kerno Exists

The kernel knows before anyone else. When disk gets slow, when the network starts dropping packets, when memory is about to run out - the kernel sees it first. Minutes before your dashboards. Hours before your users.

Every observability tool you already use watches your application. Kerno watches the kernel - the layer underneath everything.

flowchart TB
    subgraph Stack["YOUR STACK"]
        App["Your Application<br/>(Node, Python, Go, Java)"]
        OS["Operating System"]
        HW["Hardware"]
    end

    subgraph Tools["WHO WATCHES WHAT"]
        APM["Datadog, New Relic<br/>Prometheus, Grafana"]
        Kerno["<b>KERNO</b><br/><i>eBPF kernel tracing</i>"]
        Bare["(nobody)"]
    end

    App -.watched by.-> APM
    OS -.watched by.-> Kerno
    HW -.watched by.-> Bare

    style Kerno fill:#e94560,stroke:#fff,color:#fff,stroke-width:3px
    style App fill:#0f3460,stroke:#16213e,color:#fff
    style OS fill:#1a1a2e,stroke:#e94560,color:#fff
    style HW fill:#533483,stroke:#16213e,color:#fff
    style APM fill:#16213e,stroke:#0f3460,color:#ccc
    style Bare fill:#16213e,stroke:#0f3460,color:#888

How Kerno compares

	Watches	K8s Required	SLO Mapping	AI Analysis	Install Time
Prometheus	Application	No	No	No	Hours
Datadog APM	Application	No	Partial	Yes	Hours
Inspektor Gadget	Container	Yes	No	No	Minutes
Kerno	Kernel	No	Yes	Yes	30 seconds

Features

Diagnostics

kerno doctor - 30-second full diagnostic with ranked findings and fix suggestions
kerno explain - AI-powered kernel error explanation (no root needed)
kerno predict - Predict failures before they happen via trend analysis

Real-Time Tracing

kerno trace syscall - Per-process syscall latency streaming
kerno trace disk - Block I/O latency by device, operation, process
kerno trace sched - CPU scheduler run queue delays

Continuous Monitoring

kerno watch tcp - TCP connections, RTT, retransmits
kerno watch oom - OOM kill alerts with full context
kerno watch fd - FD leak detection via growth rate
kerno start - Daemon mode with Prometheus metrics

Integrations

Prometheus - 16 metrics at /metrics
Kubernetes - Helm chart + pod enrichment
AI Providers - Anthropic, OpenAI, Ollama (optional)
Systemd - Unit/slice enrichment on bare metal

Quick Start

Prerequisites

Linux kernel >= 5.8 with BTF support (check: ls /sys/kernel/btf/vmlinux)
Root access (or CAP_BPF + CAP_PERFMON)

One-liner install (any Linux)

curl -sfL https://raw.githubusercontent.com/lowplane/kerno/main/scripts/install.sh | sudo bash
sudo kerno doctor

Or run as a daemon (bare metal / VMs)

curl -sfL https://raw.githubusercontent.com/lowplane/kerno/main/scripts/install.sh | sudo bash -s -- --daemon
# Kerno is now running as a systemd service with Prometheus metrics on :9090
sudo kerno doctor          # One-shot diagnosis
journalctl -u kerno -f     # Stream logs
curl localhost:9090/metrics # Scrape with Prometheus

Or deploy on Kubernetes

helm install kerno ./deploy/helm/kerno \
  -n kerno-system --create-namespace

Or build from source

git clone https://github.com/lowplane/kerno.git
cd kerno && make build
sudo ./bin/kerno doctor

Or run it in Docker

docker run --privileged --pid=host \
  -v /sys/kernel/debug:/sys/kernel/debug:ro \
  -v /sys/fs/bpf:/sys/fs/bpf \
  -v /proc:/proc:ro \
  ghcr.io/lowplane/kerno:latest doctor

How It Works

Kerno runs as a lightweight agent. When you run kerno doctor, it loads six tiny eBPF programs into the kernel, collects 30 seconds of real data, runs it through 11 diagnostic rules, and gives you a ranked report. No sampling. No guesswork. Actual kernel data.

The Architecture

flowchart TB
    subgraph Kernel["KERNEL SPACE · eBPF Programs"]
        direction LR
        P1["syscall<br/>latency"]
        P2["tcp<br/>monitor"]
        P3["oom<br/>track"]
        P4["disk<br/>io"]
        P5["sched<br/>delay"]
        P6["fd<br/>track"]
    end

    RB[("Ring Buffers<br/>256KB per program<br/>zero-copy mmap")]

    subgraph UserSpace["USER SPACE · Go"]
        direction TB
        Loader["BPF Loaders<br/>cilium/ebpf"]
        Collector["Collectors<br/>percentile aggregation"]
        Signals[("Signals Snapshot<br/>single source of truth")]
        Adapter["Environment Adapter<br/>bare metal · systemd · k8s"]
    end

    subgraph Outputs["OUTPUTS"]
        direction TB
        Doctor["Doctor Engine<br/>11 diagnostic rules"]
        AI["AI Layer <i>(optional)</i><br/>root cause analysis"]
        Prom["Prometheus<br/>/metrics :9090"]
        CLI["Terminal<br/>pretty · JSON"]
    end

    P1 & P2 & P3 & P4 & P5 & P6 --> RB
    RB --> Loader
    Loader --> Collector
    Collector --> Signals
    Adapter -.enriches.-> Signals
    Signals --> Doctor
    Signals --> Prom
    Doctor --> AI
    AI --> CLI
    Doctor --> CLI

    classDef kernel fill:#1a1a2e,stroke:#e94560,color:#fff,stroke-width:2px
    classDef user fill:#0f3460,stroke:#16213e,color:#fff,stroke-width:2px
    classDef output fill:#16213e,stroke:#533483,color:#fff,stroke-width:2px
    classDef buffer fill:#533483,stroke:#e94560,color:#fff,stroke-width:3px
    classDef ai fill:#e94560,stroke:#fff,color:#fff,stroke-width:2px

    class P1,P2,P3,P4,P5,P6 kernel
    class Loader,Collector,Signals,Adapter user
    class Doctor,Prom,CLI output
    class RB buffer
    class AI ai

The Data Flow

sequenceDiagram
    participant K as Kernel<br/>(eBPF)
    participant R as Ring Buffer
    participant C as Collectors
    participant D as Doctor Engine
    participant A as AI Layer
    participant U as You

    K->>R: syscall/tcp/oom/io events
    Note over K,R: Zero-copy, microsecond overhead
    R->>C: drain events
    C->>C: aggregate into p50/p95/p99
    C->>D: Signals snapshot
    D->>D: evaluate 11 rules
    alt AI enabled
        D->>A: findings + signals
        A->>A: correlate + explain
        A->>U: enhanced report
    else AI disabled
        D->>U: deterministic report
    end

The Diagnostic Rules

Kerno runs 11 deterministic rules against every snapshot. Every rule is explainable, configurable, and tested.

#	Rule	Triggers When	Severity
1	Disk I/O Bottleneck	fsync p99 > 50ms or write p99 > 200ms	WARN / CRIT
2	OOM Kill Occurred	Any OOM event in window	CRIT
3	TCP Retransmit Storm	Retransmit rate > 2%	CRIT
4	TCP RTT Degradation	RTT p99 > 10ms	WARN
5	Scheduler Contention	Runqueue delay p99 > 5ms	WARN / CRIT
6	FD Leak	FD growth > 10/sec sustained	WARN (with ETA)
7	Syscall Latency High	Any syscall p99 > 100ms	WARN / CRIT
8	OOM Imminent	Memory > 90% + positive growth	WARN / CRIT (with ETA)
9	Syscall Error Rate	Error rate > 1% per syscall	WARN / CRIT
10	Memory Pressure	RSS usage > 90%	WARN
11	Network Latency	Connection RTT > 100ms	WARN

Usage

Diagnostics - "What's wrong right now?"

# The golden command: 30-second full diagnostic
sudo kerno doctor

# Quick 10-second check
sudo kerno doctor --duration 10s

# JSON output for CI/CD (exits non-zero on critical findings)
sudo kerno doctor --output json --exit-code

# With AI-powered analysis
export KERNO_AI_API_KEY="sk-..."
sudo kerno doctor --ai

# Explain a kernel error (no root needed)
kerno explain "BUG: kernel NULL pointer dereference"
dmesg | tail -5 | kerno explain

# Predict failures before they happen
sudo kerno predict --snapshots 5 --interval 15s

Real-Time Tracing - "Watch it happen"

# Stream every syscall event
sudo kerno trace syscall

# Only syscalls from PID 1234
sudo kerno trace syscall --pid 1234

# Only read() syscalls, as JSON
sudo kerno trace syscall --filter read --output json

# Top 10 syscalls by p99 latency (updates every second)
sudo kerno trace syscall --top 10

# Disk I/O for postgres, writes over 5ms only
sudo kerno trace disk --process postgres --op write --threshold 5ms

# Scheduler delays > 10ms
sudo kerno trace sched --threshold 10ms

Continuous Monitoring - "Let me know when..."

# Watch TCP for any connections with retransmits
sudo kerno watch tcp --retransmits

# Alert on any OOM kill
sudo kerno watch oom --alert

# Detect processes leaking FDs
sudo kerno watch fd --threshold 10

# Run as a daemon (Prometheus metrics on :9090)
sudo kerno start

Prometheus Metrics

When you run sudo kerno start, Kerno exposes 16 Prometheus metrics at :9090/metrics - scrape it with Prometheus, visualize in Grafana, alert via Alertmanager.

View all 16 metrics

Metric	Type	What It Measures
`kerno_syscall_duration_nanoseconds`	Summary	Syscall latency (p50, p95, p99)
`kerno_syscall_total`	Counter	Total syscall events
`kerno_tcp_rtt_nanoseconds`	Summary	TCP round-trip time
`kerno_tcp_retransmits_total`	Counter	TCP retransmissions
`kerno_tcp_connections_total`	Counter	TCP connection events
`kerno_oom_kills_total`	Counter	OOM kill events
`kerno_disk_io_duration_nanoseconds`	Summary	Disk I/O latency
`kerno_disk_io_bytes_total`	Counter	Disk I/O bytes
`kerno_sched_delay_nanoseconds`	Summary	CPU run queue delay
`kerno_fd_open_total`	Counter	FD open operations
`kerno_fd_close_total`	Counter	FD close operations
`kerno_collector_events_total`	Counter	Events per collector
`kerno_collector_errors_total`	Counter	Errors per collector
`kerno_bpf_programs_loaded`	Gauge	Loaded eBPF programs
`kerno_info`	Gauge	Build version

Health endpoints: /healthz and /readyz return JSON status.

Kubernetes Deployment

Kerno auto-detects Kubernetes and enriches every event with pod, namespace, node, and deployment labels - no client-go dependency, no heavyweight informers.

flowchart TB
    subgraph Cluster["Kubernetes Cluster"]
        direction TB
        subgraph Node1["Worker Node 1"]
            K1["Kerno Pod<br/><i>DaemonSet</i>"]
            W1["Workload Pods"]
        end
        subgraph Node2["Worker Node 2"]
            K2["Kerno Pod<br/><i>DaemonSet</i>"]
            W2["Workload Pods"]
        end
        subgraph Node3["Worker Node N"]
            K3["Kerno Pod<br/><i>DaemonSet</i>"]
            W3["Workload Pods"]
        end
    end

    K1 -->|:9090/metrics| Prom["Prometheus"]
    K2 -->|:9090/metrics| Prom
    K3 -->|:9090/metrics| Prom
    Prom --> GF["Grafana"]

    K1 -.enriches.-> W1
    K2 -.enriches.-> W2
    K3 -.enriches.-> W3

    style K1 fill:#e94560,stroke:#fff,color:#fff
    style K2 fill:#e94560,stroke:#fff,color:#fff
    style K3 fill:#e94560,stroke:#fff,color:#fff
    style Prom fill:#0f3460,stroke:#fff,color:#fff
    style GF fill:#16213e,stroke:#fff,color:#fff
    style W1 fill:#533483,stroke:#fff,color:#fff
    style W2 fill:#533483,stroke:#fff,color:#fff
    style W3 fill:#533483,stroke:#fff,color:#fff

Install with Helm

helm install kerno ./deploy/helm/kerno \
  -n kerno-system --create-namespace

Customize via values.yaml

image:
  repository: ghcr.io/lowplane/kerno
  tag: latest

resources:
  requests: { cpu: 100m, memory: 128Mi }
  limits:   { cpu: "1",  memory: 512Mi }

prometheus:
  enabled: true
  port: 9090

# Enable Prometheus Operator ServiceMonitor
serviceMonitor:
  enabled: true
  interval: 15s

# Restrict to specific nodes
nodeSelector:
  monitoring: "true"

Verify the deployment

kubectl -n kerno-system get ds kerno
kubectl -n kerno-system logs -l app.kubernetes.io/name=kerno
kubectl -n kerno-system port-forward ds/kerno 9090:9090
curl localhost:9090/metrics

Raw manifests

If you don't use Helm, apply the manifests directly:

kubectl apply -f deploy/k8s/

Environment & AI

Environment auto-detection. On start, Kerno picks one of three adapters and enriches every event - no configuration required:

Kubernetes (token present) → pod, namespace, node, deployment
Systemd (PID 1 is systemd) → unit, slice, scope
Bare metal → hostname, cgroup path

AI (optional). The AI layer runs after the deterministic rule engine - it enriches findings with cross-signal correlation and root cause explanations, never replaces them. Three providers (Anthropic, OpenAI, Ollama for air-gapped), three privacy modes (full / redacted / summary), TTL cache + token-bucket rate limiting, and graceful fallback to a deterministic template on failure. No LLM SDK dependencies - pure net/http.

export KERNO_AI_API_KEY="sk-..."
export KERNO_AI_PROVIDER="anthropic"   # or openai, ollama
sudo kerno doctor --ai

Configuration

Kerno works with zero configuration. For custom setups, create /etc/kerno/config.yaml:

log_level: info
log_format: text

collectors:
  syscall_latency: true
  tcp_monitor: true
  oom_track: true
  disk_io: true
  sched_delay: true
  fd_track: true

doctor:
  duration: 30s
  thresholds:
    syscall_p99_warning_ns:  100000000   # 100ms
    syscall_p99_critical_ns: 500000000   # 500ms
    tcp_retransmit_pct:      2.0         # 2%
    oom_memory_pct:          90.0        # 90%
    disk_p99_warning_ns:     50000000    # 50ms
    disk_p99_critical_ns:    200000000   # 200ms
    sched_delay_warning_ns:  5000000     # 5ms
    sched_delay_critical_ns: 20000000    # 20ms
    fd_growth_per_sec:       10.0

prometheus:
  enabled: true
  addr: ":9090"

ai:
  enabled: false
  provider: anthropic
  privacy_mode: summary

Precedence: CLI flags > environment variables (KERNO_*) > config file > defaults.

Building from Source

# Requirements: Go 1.25+
# Optional for real eBPF: clang 14+, libbpf-dev, llvm

make build          # Build binary (uses BPF stubs - no clang needed)
make bpf            # Compile eBPF C programs
make test           # Run unit tests
make test-race      # Run with race detector
make lint           # golangci-lint
make check          # Full CI check: vet + test + lint
make docker         # Build Docker image

Contributing

Contributions welcome. See CONTRIBUTING.md for:

Development setup and prerequisites
Commit message conventions (Conventional Commits)
Code review process
DCO sign-off requirement

For security reports, see SECURITY.md.

License

Apache License 2.0 - see LICENSE.

Kerno is built by Shivam at Lowplane.

If Kerno saved your Sunday, consider leaving a star - it helps other engineers find the project.

Directories ¶

Path	Synopsis
cmd
kerno command Kerno is an eBPF-based kernel observability engine for Linux.	Kerno is an eBPF-based kernel observability engine for Linux.
internal
adapter Package adapter provides environment adapters that enrich kernel events with context metadata (hostname, K8s pod/namespace, systemd unit, etc.).	Package adapter provides environment adapters that enrich kernel events with context metadata (hostname, K8s pod/namespace, systemd unit, etc.).
ai Package ai implements LLM provider abstractions and the AI analysis layer for kerno doctor.	Package ai implements LLM provider abstractions and the AI analysis layer for kerno doctor.
bpf Package bpf provides the eBPF program loaders and Go event types.	Package bpf provides the eBPF program loaders and Go event types.
cli Package cli defines all Cobra commands for the kerno CLI.	Package cli defines all Cobra commands for the kerno CLI.
collector Package collector defines the interface and registry for signal collectors.	Package collector defines the interface and registry for signal collectors.
config Package config defines the global configuration for Kerno.	Package config defines the global configuration for Kerno.
doctor Package doctor implements the kerno doctor diagnostic engine.	Package doctor implements the kerno doctor diagnostic engine.
metrics Package metrics defines and registers the Prometheus metrics for Kerno.	Package metrics defines and registers the Prometheus metrics for Kerno.
version Package version exposes the build metadata injected at compile time via ldflags.	Package version exposes the build metadata injected at compile time via ldflags.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL