perf-agent

command module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

README ΒΆ

perf-agent

eBPF-based Linux profiler β€” CPU, off-CPU, and PMU, system-wide or per-PID, pprof output.

CI Tests Go Reference Go Version License

One binary, runs locally, no backend or telemetry.

🚧 GPU profiling support is in active development as an experimental track. CPU, off-CPU, and PMU profiling are stable today.


Contents


Quickstart

# Build (one-time, see BUILDING.md for full toolchain setup)
make build

# Grant capabilities once so subsequent runs don't need sudo
sudo setcap cap_sys_admin,cap_bpf,cap_perfmon,cap_sys_ptrace,cap_checkpoint_restore+ep ./perf-agent

# Capture a 30-second CPU profile of one process β€” output is pprof
./perf-agent --profile --pid <PID> --duration 30s

# Inspect
go tool pprof <output>.pb.gz

What you can do with perf-agent

πŸ”₯ On-demand production profiling

Hot-attach to a running process β€” no restart, no preinstalled agent. For Python 3.12+, --inject-python enables the perf trampoline only for the capture window, so there's no persistent overhead.

πŸ’€ Off-CPU stalls and blocking analysis

Find why a service is "slow but not CPU-busy." --offcpu hooks sched_switch and accumulates blocking time per call site β€” lock waits, syscall blocks, channel reads, mutex contention.

🐍 Cross-language flame graphs

One profile, multiple runtimes. Native (DWARF + ELF) symbolizes alongside Python (-X perf perf-maps, optionally activated on demand), Node.js (--perf-basic-prof), and Go. The hybrid FP+DWARF unwinder handles release-built C++/Rust without -fno-omit-frame-pointer.

πŸ“Š Hardware-counter performance investigations

--pmu summarizes IPC, cache miss rate, runqueue latency (P50/P99), and context-switch reasons (preempted vs voluntary vs I/O wait). Combine with --per-pid in system-wide mode to see which processes dominate the node's wait time.

🐳 Kubernetes-aware profile labels

Run as a DaemonSet on the host PID namespace (recommended): perf-agent sees every node process and tags each sample with pod_uid, container_id, and cgroup_path parsed from /proc/<pid>/cgroup β€” no kubelet API, no client-go.

For single-tenant pods, sidecar mode also works with shareProcessNamespace: true (which exposes every container's processes to every other container β€” fine when the agent and target are co-deployed by the same operator, a security regression otherwise). Downward-API env vars then add pod_name / namespace / container_name labels.

--pid <N> accepts in-pod PIDs and translates them to host PIDs automatically.

πŸ” Stripped production binaries via off-box symbols

Production builds usually strip debug info. Point perf-agent at a debuginfod-protocol server with --debuginfod-url=URL. A per-mapping classifier routes each binary in the target:

  • Has local DWARF or resolvable .gnu_debuglink β†’ blazesym's process-mode (system libs from distro debuginfo land here for free).
  • Stripped, build-id only (Rust/Go release builds) β†’ file-mode against the cached .debug, fetched on demand and content-addressed by build-id.
  • Deleted-but-still-mapped binary (sidecar / mount-namespace case) β†’ same flow, opened via /proc/<pid>/map_files.

Cache layout, dispatcher details, and the address-normalization math: see docs/debuginfod-symbolization.md.

πŸ§ͺ PGO and flame graphs

High-fidelity pprof: every Mapping carries the absolute path, GNU build-id, and file offsets; every Location is address-stable across runs. Feeds go tool pprof -diff_base and Go's native -pgo=... flag.

For toolchains that don't speak pprof, add --perf-data-output app.perf.data to emit a kernel-format perf.data alongside the pprof output. Same capture, two formats:

See docs/perf-data-output.md for the per-tool walkthrough.


Requirements

  • Linux kernel 5.8+ (BTF + CO-RE).
  • Root, OR setcap cap_sys_admin,cap_bpf,cap_perfmon,cap_sys_ptrace,cap_checkpoint_restore+ep ./perf-agent.

Usage

# CPU profiling β€” DWARF/hybrid walker is the default
./perf-agent --profile --pid <PID>

# Force frame-pointer-only walker (cheaper startup, may truncate on FP-less binaries)
./perf-agent --profile --unwind fp --pid <PID>

# Force DWARF walker (eager CFI compile + per-frame hybrid)
./perf-agent --profile --unwind dwarf --pid <PID>

# Off-CPU profiling
./perf-agent --offcpu --pid <PID>

# Combined on-CPU + off-CPU
./perf-agent --profile --offcpu --pid <PID>

# PMU only (hardware counters)
./perf-agent --pmu --pid <PID>

# System-wide
./perf-agent --profile -a --duration 30s

# All features with metadata tags
./perf-agent --profile --offcpu --pmu --pid <PID> --duration 30s \
    --tag env=production \
    --tag version=1.2.3 \
    --tag service=api

For Python workloads, see docs/python-profiling.md.


Flags

Flag Description Default
--profile Enable CPU profiling with stack traces false
--offcpu Enable off-CPU profiling with stack traces false
--pmu Enable PMU hardware counters false
--pid <PID> Target process ID -
-a, --all System-wide (all processes) false
--per-pid Per-PID breakdown (only with -a --pmu) false
--duration Collection duration 10s
--sample-rate CPU profile sample rate (Hz) 99
--unwind Stack unwinding strategy: fp | dwarf | auto (auto routes to dwarf; the hybrid walker covers FP-safe code via the FP path) auto
--profile-output Output path for CPU profile auto-named
--offcpu-output Output path for off-CPU profile auto-named
--pmu-output Output path for PMU metrics (auto for auto-named) stdout
--perf-data-output Also emit a Linux kernel-format perf.data (consumable by perf script, FlameGraph, hotspot, AutoFDO create_llvm_prof, …). Requires --profile. -
--inject-python Activate Python 3.12+ perf trampoline on the target before profiling false
--tag key=value Add tag to profile (repeatable) -
--debuginfod-url=URL Add a debuginfod-protocol server (repeatable). Falls back to DEBUGINFOD_URLS env. Unset β†’ off. -
--symbol-cache-dir=DIR Local directory for fetched artifacts. /tmp/perf-agent-debuginfod
--symbol-cache-max=BYTES LRU cap for the symbol cache. 2147483648 (2 GiB)
--symbol-fetch-timeout=DUR Per-artifact HTTP fetch timeout. 30s
--symbol-fail-closed (M2 stub) Refuse to symbolize a mapping whose fetch failed. false

Either --pid or -a/--all is required. At least one of --profile, --offcpu, or --pmu must be specified.


Output

Output file naming

Output files are auto-named by process name + timestamp + profile type:

Mode Per-PID example System-wide example
--profile myapp-202604021430-on-cpu.pb.gz 202604021430-on-cpu.pb.gz
--offcpu myapp-202604021430-off-cpu.pb.gz 202604021430-off-cpu.pb.gz
--pmu-output auto myapp-202604021430-pmu.txt 202604021430-pmu.txt

Process name comes from /proc/<pid>/comm. Override with --profile-output / --offcpu-output.

pprof fidelity

CPU and off-CPU profiles are full-fidelity pprof: every Mapping carries the absolute path, GNU build-id, and file offsets; every Location is keyed by file offset (not symbol name) so cross-run diffing and sample-PGO converters work. [kernel] and [jit] sentinels handle the special cases. Tags from --tag key=value land as profile-level comments; k8s identity labels (when running in a pod) attach per-sample.

go tool pprof myapp-202604021430-on-cpu.pb.gz

With --debuginfod-url configured, pprof comes back fully symbolized β€” function names + source :line β€” even when debug info isn't present locally. See docs/debuginfod-symbolization.md.

PMU output

On-CPU time, runqueue latency, context-switch reasons, hardware counters (cycles, instructions, cache misses), and derived metrics (IPC, cache miss rate).

Example:

=== PMU Metrics (PID: 84228) ===
Samples: 26358

On-CPU Time (time slice per context switch):
  Min:    0.003 ms
  P50:    0.071 ms
  P99:    9.183 ms

Runqueue Latency (time waiting for CPU):
  Min:    0.001 ms
  P50:    0.012 ms
  P99:    0.850 ms

Context Switch Reasons:
  Preempted (running):     45.2%  (11912 times)
  Voluntary (sleep/mutex): 42.1%  (11095 times)
  I/O Wait (D state):      12.7%  (3351 times)

Hardware Counters:
  IPC (Instr/Cycle):  2.342
  Cache Misses/1K:    0.022

Library usage

perf-agent is also a Go library via the perfagent package:

agent, _ := perfagent.New(
    perfagent.WithPID(12345),
    perfagent.WithCPUProfile("profile.pb.gz"),
    perfagent.WithPMU(),
)
defer agent.Close()
agent.Start(ctx); time.Sleep(10*time.Second); agent.Stop(ctx)

See the perfagent package docs for in-memory output, custom label enrichers, and metrics exporters.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                            USER SPACE (Go)                               β”‚
β”‚                                                                          β”‚
β”‚                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                  β”‚
β”‚                            β”‚ main.go  β”‚                                  β”‚
β”‚                            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                  β”‚
β”‚                                 β–Ό                                        β”‚
β”‚                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                               β”‚
β”‚                       β”‚ perfagent.Agent  β”‚  lifecycle + --unwind dispatchβ”‚
β”‚                       β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                               β”‚
β”‚       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚       β–Ό                     β–Ό                         β–Ό                  β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚ β”‚  CPU Profiler β”‚  β”‚  DWARF CPU/Off-CPU   β”‚  β”‚ PMU Monitor  β”‚            β”‚
β”‚ β”‚   (FP path)   β”‚  β”‚      Profiler        β”‚  β”‚              β”‚            β”‚
β”‚ β”‚   profile/    β”‚  β”‚  unwind/dwarfagent/  β”‚  β”‚   cpu/       β”‚            β”‚
β”‚ β”‚   offcpu/     β”‚  β”‚   (hybrid walker)    β”‚  β”‚              β”‚            β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚         β”‚                     β”‚                     β”‚                    β”‚
β”‚         β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚                    β”‚
β”‚         β”‚     β–Ό                               β–Ό     β”‚                    β”‚
β”‚         β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚         β”‚   β”‚ unwind/ehcompileβ”‚    β”‚  unwind/ehmaps       β”‚              β”‚
β”‚         β”‚   β”‚ .eh_frame β†’ CFI β”‚    β”‚  per-PID map lifecyleβ”‚              β”‚
β”‚         β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚  + MMAP2 watcher     β”‚              β”‚
β”‚         β”‚                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚         β”‚                                     β”‚                          β”‚
β”‚         β–Ό                                     β–Ό                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚   β”‚              unwind/procmap (Resolver)                       β”‚       β”‚
β”‚   β”‚   /proc/<pid>/maps + .note.gnu.build-id, lazy per-PID cache  β”‚       β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚                        β–Ό                                                 β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚   β”‚            pprof/ ProfileBuilder                             β”‚       β”‚
β”‚   β”‚  address-keyed Locations + per-binary Mapping (build-id,     β”‚       β”‚
β”‚   β”‚  file offsets) + kernel/[jit] sentinels + name-based         β”‚       β”‚
β”‚   β”‚  fallback when resolver misses                               β”‚       β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚                                                                          β”‚
β”‚   Symbolization: blazesym (DWARF + ELF + perf-maps for JIT runtimes)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚                  β”‚                  β”‚
══════════════β•ͺ══════════════════β•ͺ══════════════════β•ͺ═══════════════════════
              β”‚  eBPF load       β”‚                  β”‚
              β–Ό                  β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          KERNEL SPACE (eBPF)                             β”‚
β”‚                                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ perf.bpf.c   β”‚  β”‚ perf_dwarf.bpf β”‚  β”‚ offcpu.bpf.c   β”‚  β”‚ cpu.bpf.cβ”‚  β”‚
β”‚  β”‚ (FP only)    β”‚  β”‚ (hybrid: FP    β”‚  β”‚ + offcpu_dwarf β”‚  β”‚ HW ctrs  β”‚  β”‚
β”‚  β”‚ stackmap     β”‚  β”‚  fast path,    β”‚  β”‚ sched_switch   β”‚  β”‚ rq lat   β”‚  β”‚
β”‚  β”‚ aggregated   β”‚  β”‚  DWARF for     β”‚  β”‚ blocking-ns    β”‚  β”‚ ctx swch β”‚  β”‚
β”‚  β”‚ counts       β”‚  β”‚  FP-less PCs)  β”‚  β”‚                β”‚  β”‚          β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                   β”‚                   β”‚               β”‚        β”‚
β”‚         β”‚             CFI tables, classification, pid_mappings  β”‚        β”‚
β”‚         β”‚             via HASH_OF_MAPS keyed by build-id        β”‚        β”‚
β”‚         β”‚                   β”‚                                   β”‚        β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                  β–Ό                         β–Ό                             β”‚
β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚           β”‚ stack ringbufβ”‚         β”‚ aggregated maps β”‚                   β”‚
β”‚           β”‚ (DWARF path) β”‚         β”‚ (FP path)       β”‚                   β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                                  β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚              OUTPUT                  β”‚
                    β”‚                                      β”‚
                    β”‚  *-on-cpu.pb.gz   *-off-cpu.pb.gz    β”‚
                    β”‚  PMU: console / file                 β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Two stack-walker paths: --unwind fp (cheap, kernel-side aggregation; truncates on FP-less code) and --unwind dwarf / auto (default β€” FP fast path with .eh_frame-derived CFI fallback for release C++/Rust without frame pointers).

Sample addresses resolve through procmap.Resolver (lazy /proc/<pid>/maps + build-id), so each pprof Mapping carries real per-binary identity and each Location is keyed by (mapping_id, file_offset) β€” what go tool pprof -diff_base and sample-based PGO converters need to round-trip.


Building

Requires Go 1.26+, Clang/LLVM, Linux headers, and blazesym (Rust C library for symbolization).

make build

The Makefile defaults to GOTOOLCHAIN=auto, so Go fetches the pinned toolchain automatically if your system Go is older. Override with GOTOOLCHAIN=local make build to enforce the locally-installed toolchain.

See BUILDING.md for the full toolchain setup.


Testing

Unit tests run without root; integration tests require root or a setcap'd binary.

# Build + cap the binary once, then run tests as a normal user
make build
sudo setcap cap_sys_admin,cap_bpf,cap_perfmon,cap_sys_ptrace,cap_checkpoint_restore+ep ./perf-agent

# Unit tests (no root)
make test-unit

# Integration tests β€” auto-skip when neither root nor caps are available
make test-integration

Test gates honor file capabilities on the perf-agent binary: a setcap'd perf-agent lets the test runner exec it without sudo. For tests that load BPF in-process (library tests), the test binary itself needs caps β€” setcap it after go test -c.

For detailed testing documentation see TESTING.md.


Contributing

PRs welcome. Read CONTRIBUTING.md before opening one β€” it covers build/test conventions, the commit-message style, and what's in-scope vs. deferred. By participating you agree to the Code of Conduct.


Security

If you find a security issue, please do not open a public issue. See SECURITY.md for the reporting channel and threat model. perf-agent runs with elevated kernel capabilities; we take privilege-escalation and kernel-DoS reports seriously.


License

Apache License 2.0 β€” see LICENSE.

Documentation ΒΆ

The Go Gopher

There is no documentation for this package.

Directories ΒΆ

Path Synopsis
bench
cmd/report command
Command report aggregates one or more bench/cmd/scenario JSON outputs into a markdown summary.
Command report aggregates one or more bench/cmd/scenario JSON outputs into a markdown summary.
cmd/scenario command
Command scenario runs a perf-agent --unwind dwarf startup benchmark against a synthetic process fleet, recording per-binary CFI compile timings via dwarfagent.Hooks.
Command scenario runs a perf-agent --unwind dwarf startup benchmark against a synthetic process fleet, recording per-binary CFI compile timings via dwarfagent.Hooks.
internal/fleet
Package fleet spawns and manages a set of child processes used as a fixture for the perf-agent scenario benchmark.
Package fleet spawns and manages a set of child processes used as a fixture for the perf-agent scenario benchmark.
inject
elfsym
Package elfsym provides ELF symbol resolution and SONAME parsing primitives shared across language-specific injectors (inject/python, future inject/nodejs, etc.).
Package elfsym provides ELF symbol resolution and SONAME parsing primitives shared across language-specific injectors (inject/python, future inject/nodejs, etc.).
ptraceop
Package ptraceop provides low-level ptrace primitives for remote function invocation: attach, save registers, write a payload, run a sequence of remote function calls (each returning via SIGSEGV at address 0), restore registers, detach.
Package ptraceop provides low-level ptrace primitives for remote function invocation: attach, save registers, write a payload, run a sequence of remote function calls (each returning via SIGSEGV at address 0), restore registers, detach.
python
Package python implements injection of CPython 3.12+'s perf trampoline (sys.activate_stack_trampoline) into running processes via ptrace, so that perf-agent can resolve Python JIT frames to qualnames without requiring the target to be launched with `python -X perf`.
Package python implements injection of CPython 3.12+'s perf trampoline (sys.activate_stack_trampoline) into running processes via ptrace, so that perf-agent can resolve Python JIT frames to qualnames without requiring the target to be launched with `python -X perf`.
internal
bpfstack
Package bpfstack parses the raw layout produced by BPF_MAP_TYPE_STACK_TRACE: a fixed 127-slot buffer of little-endian u64 instruction pointers, terminated by a zero slot.
Package bpfstack parses the raw layout produced by BPF_MAP_TYPE_STACK_TRACE: a fixed 127-slot buffer of little-endian u64 instruction pointers, terminated by a zero slot.
k8slabels
Package k8slabels derives Kubernetes identity labels from a target process's cgroup path and the agent's own downward-API environment.
Package k8slabels derives Kubernetes identity labels from a target process's cgroup path and the agent's own downward-API environment.
nspid
Package nspid translates a PID from any Linux PID namespace into the outermost (host) kernel PID.
Package nspid translates a PID from any Linux PID namespace into the outermost (host) kernel PID.
perfdata
Package perfdata writes Linux kernel perf.data files.
Package perfdata writes Linux kernel perf.data files.
perfevent
Package perfevent opens per-CPU software perf_event_open events and attaches a BPF program to each.
Package perfevent opens per-CPU software perf_event_open events and attaches a BPF program to each.
Package metrics provides types and interfaces for exporting performance metrics.
Package metrics provides types and interfaces for exporting performance metrics.
Package perfagent provides a library interface for the performance monitoring agent.
Package perfagent provides a library interface for the performance monitoring agent.
Package symbolize provides perf-agent's address-to-frame resolution abstraction.
Package symbolize provides perf-agent's address-to-frame resolution abstraction.
debuginfod/cache
Package cache stores debuginfod-fetched artifacts on disk under a .build-id/<NN>/<rest>{.debug,} layout that blazesym's debug_dirs walker recognizes natively.
Package cache stores debuginfod-fetched artifacts on disk under a .build-id/<NN>/<rest>{.debug,} layout that blazesym's debug_dirs walker recognizes natively.
unwind
dwarfagent
Package dwarfagent wires the perf_dwarf BPF program, the ehmaps lifecycle (TableStore / PIDTracker / MmapWatcher), and pprof output into a single Profiler with the same Collect/CollectAndWrite shape as profile.Profiler.
Package dwarfagent wires the perf_dwarf BPF program, the ehmaps lifecycle (TableStore / PIDTracker / MmapWatcher), and pprof output into a single Profiler with the same Collect/CollectAndWrite shape as profile.Profiler.
ehcompile
Package ehcompile parses an ELF file's .eh_frame section and produces flat tables of unwind rules suitable for loading into BPF maps.
Package ehcompile parses an ELF file's .eh_frame section and produces flat tables of unwind rules suitable for loading into BPF maps.
ehmaps
Package ehmaps populates the BPF-side CFI / classification / pid-mappings maps from unwind/ehcompile output.
Package ehmaps populates the BPF-side CFI / classification / pid-mappings maps from unwind/ehcompile output.
fpwalker
Package fpwalker unwinds a stack given captured registers and raw stack bytes from a PERF_SAMPLE_STACK_USER sample, assuming the target used frame pointers.
Package fpwalker unwinds a stack given captured registers and raw stack bytes from a PERF_SAMPLE_STACK_USER sample, assuming the target used frame pointers.
perfreader
Package perfreader captures PERF_RECORD_SAMPLE events via perf_event_open with REGS_USER + STACK_USER so userspace can DWARF-unwind the raw stack.
Package perfreader captures PERF_RECORD_SAMPLE events via perf_event_open with REGS_USER + STACK_USER so userspace can DWARF-unwind the raw stack.
procmap
Package procmap resolves addresses into per-binary mapping identity (path, start/limit, file offset, build-id) by parsing /proc/<pid>/maps and ELF .note.gnu.build-id sections.
Package procmap resolves addresses into per-binary mapping identity (path, start/limit, file offset, build-id) by parsing /proc/<pid>/maps and ELF .note.gnu.build-id sections.