Performance engineering, debugging, and tuning across the full stack: the Linux kernel and CPU microarchitecture, profiling and tracing, language runtimes, storage engines, databases, distributed systems, and the data structures that decide whether any of it is fast.
It started as a Linux performance toolkit. It now spans ~50 chapters and guides, from perf and eBPF up to vector indexes, LSM compaction, and HFT C++.
The Linux chapters target kernel 6.6+ (EEVDF scheduler, modern eBPF). The data-structure, database, and architecture chapters are OS-agnostic.
Last Updated: 2026-06
- Start with the 60-second checklist (see Linux Quick Start) for quick triage
- Use the Navigation below to find a topic, grouped into six parts
- Check the cheatsheets for copy-paste commands
- Refer to curated sources for deep dives on specific areas
For investigation workflow:
Symptom → 60-Second Analysis → Identify resource bottleneck → Deep dive chapter
Before reaching for any tool, read 00b - Observability Boundaries. It encodes the single most important principle in this handbook: every layer's metrics lie by omission about the layer below it. A 99% PG buffer-hit ratio can coexist with a saturated NVMe; a JVM at 2 GB heap can be OOM-killed at 8 GB RSS; a container at 80% memory can be 50% reclaimable cache. If you skip this chapter, you will trust one layer and miss the truth in the next.
Where to start, and how to not fool yourself.
| Chapter | Description |
|---|---|
| 00 - Troubleshooting Framework | USE/RED methods, 60-second checklist, decision trees |
| 00b - Observability Boundaries | Read first. Layer stack, "stats lie" anti-patterns, triangulation pattern, tool-by-layer cheat sheet |
| Bryan Cantrill Debugging Methodology | Questions-first, state preservation, systematic elimination |
Kernel, CPU, memory, disk, network: the original core.
| Chapter | Description |
|---|---|
| 01 - Modern CLI Replacements | Rust/Go tools replacing classic Unix utils |
| 02 - System Monitoring | CPU, memory, process monitoring |
| 03 - Network Analysis | Traffic analysis, DNS, HTTP testing |
| 04 - Disk & Storage | I/O benchmarking, filesystem tools |
| 05 - Performance Profiling | perf, flame graphs, profilers |
| 06 - eBPF & Tracing | BCC tools, bpftrace, ftrace, sched_ext |
| 07 - Containers & K8s | Docker, Kubernetes debugging |
| 08 - Kernel Tuning | sysctl, EEVDF scheduler, memory, I/O |
| 09 - Network Tuning | TCP, BBR, io_uring, NUMA networking |
| 11 - GPU & HPC | GPU profiling, MIG, HPC tracing |
| 12 - Observability & Metrics | Prometheus, Grafana, OpenTelemetry |
| 15 - Memory Subsystem | NUMA, huge pages, memory profiling |
| 16 - Scheduler & Interrupts | CPU scheduling, context switching |
| 17 - Ftrace Production | Function tracing, trace-cmd |
| 18 - VDSO & Clock Source | Time syscalls, TSC, cloud VM performance |
| 18 - Off-CPU Analysis | Wall-clock profiling, blocking detection, load-scaling bottlenecks |
| eBPF Performance Overhead | Hook overhead, map types, production deployment |
| Container Debugging Patterns | cAdvisor, GOMAXPROCS, PSI, cgroup v2 debugging |
| Scheduler Debugging Deep Dive | CFS bugs, run-queue attribution, Perfetto, noisy-neighbor detection |
| TCP Edge Cases & Load Balancers | SYN retry, LB buffering, timeout hierarchies |
JVM behavior, and how tail latency hides from you.
| Chapter | Description |
|---|---|
| 10 - Java/JVM | JVM profiling and tuning (ZGC, async-profiler, GC analysis) |
| 13 - Latency Analysis | Tail latency, coordinated omission, P99 |
| Coordinated Omission Guide | Load-testing correctness, wrk2, timestamp injection |
Query plans, storage internals, and what breaks in production.
| Chapter | Description |
|---|---|
| 14 - Database Profiling | PostgreSQL, MySQL, query optimization, cross-layer PG profiling |
| Database Production Debugging | Hot partitions, cache pollution, admission control, lock analysis |
| 19 - Storage Engine Patterns | LMDB, RocksDB, LSM, columnar, vectorized engines |
| 23 - Database Scaling | Sharded ID generation, zero-downtime reshard, connection pools, Trino |
| 35 - LSM Compaction Strategies | Leveled/STCS/Universal/FIFO/TWCS, Dostoevsky, Monkey, tombstones, RUM conjecture |
Messaging, streaming, caching, and keeping state consistent across nodes.
| Chapter | Description |
|---|---|
| 20 - Resilience Patterns | Circuit breakers, bulkheads, retry/backoff tuning, timeout budgets |
| 21 - Caching Patterns | Redis memory encoding, stampede protection, Redis→MySQL/Kafka migrations |
| 22 - Kafka & Messaging | Partitioning, replication, producer/broker config, scaling at volume |
| 24 - Real-time Analytics Architectures | Uber AresDB/AthenaX, streaming SQL, RisingWave, distributed time lineage |
| 25 - Big Data & ML Platforms | Lakehouse evolution, distributed training, inference throughput, billion-scale vector search |
| CRDTs: Lock-Free Distributed State | G-Counter, PN-Counter, OR-Set, LWW-Register with delta sync |
The structures and tricks behind fast databases, search, and trading systems.
| Chapter | Description |
|---|---|
| 26 - C++ HFT Optimization Patterns | SwissTable/F14, branchless, hugepages, FastQueue, kernel bypass, 30 LLM-actionable levers |
| 27 - Compact Integer Sets | Roaring bitmaps, Elias-Fano/PEF, Bloom/Cuckoo/XOR/Binary Fuse, ART — decision matrix + heuristics |
| 28 - Probabilistic Sketches | HLL/UltraLogLog, Count-Min, t-digest/DDSketch/KLL, Theta, MinHash/LSH — cardinality, frequency, quantile |
| 29 - Vector ANN Indexes | HNSW, IVF-PQ, DiskANN, SCANN, filtered ANN, hybrid search — recall vs latency, pgvector/FAISS/Qdrant |
| 30 - Hash Tables at Scale | SwissTable/F14/hashbrown, Robin Hood, Cuckoo, MPHF/PTHash/RecSplit, hash DoS, concurrent maps |
| 31 - Columnar Encoding Cookbook | Dict, RLE, FOR/PFOR, Gorilla/Chimp/ALP, FSST, Zstd dict — Parquet/ORC/Arrow/ClickHouse |
| 32 - Ordered/Range/Spatial Structures | Skip list, Bw-tree, Fenwick, segment tree, BKD, R-tree, H3/S2, space-filling curves |
| 33 - Compressed Strings & Tries | FST, FSST, Patricia, DAWG, front-coding, FM-index, wavelet trees, LOUDS — Lucene/Tantivy/DuckDB |
| 34 - Learned Indexes | RMI, PGM-index, ALEX, learned Bloom — honest 2026 verdict vs B-tree/ART |
| Cheatsheet | Description |
|---|---|
| One-Liners | Quick diagnostic commands by problem type |
| Sysctl Reference | Key kernel parameters |
| VDSO/Clock Troubleshooting | Quick detection and fixes for time-related performance |
From Brendan Gregg's Linux Performance Analysis in 60,000 Milliseconds:
uptime # load averages
dmesg | tail # kernel errors
vmstat 1 # system-wide stats
mpstat -P ALL 1 # CPU balance
pidstat 1 # process CPU
iostat -xz 1 # disk I/O
free -m # memory usage
sar -n DEV 1 # network I/O
sar -n TCP,ETCP 1 # TCP stats
top # overview| Classic | Modern | Why |
|---|---|---|
ls |
eza |
Git status, icons, tree view |
cat |
bat |
Syntax highlighting |
find |
fd |
5x faster, simpler syntax |
grep |
ripgrep |
10x faster, .gitignore aware |
du |
dust |
Visual bars |
df |
duf |
Clean tables |
top |
btop |
Dashboard UI |
dig |
dog/doggo |
DoH/DoT, colors |
curl |
xh |
Human-friendly HTTP |
sed |
sd |
Sane regex |
cd |
zoxide |
Frecency-based jump |
Application -> async-profiler (Java), py-spy (Python), rbspy (Ruby)
|
Userspace -> perf, valgrind, heaptrack
|
Syscalls -> strace, ltrace
|
Kernel -> eBPF/BCC, bpftrace, ftrace
|
Hardware -> perf stat, turbostat, intel_gpu_top
L7 (HTTP) -> httpie, xh, hey, wrk, k6, vegeta
|
L4 (TCP) -> ss, netstat, tcpdump, termshark
|
L3 (IP) -> mtr, traceroute, ping, gping
|
L2 (Link) -> ethtool, ip link
# Modern CLI tools
sudo apt install ripgrep fd-find bat eza fzf btop git-delta zoxide duf gping
# Performance tools
sudo apt install linux-tools-common linux-tools-$(uname -r) bpfcc-tools bpftrace
# Network tools
sudo apt install mtr-tiny tcpdump nmap iperf3 netcat-openbsd
# Monitoring
sudo apt install sysstat htop iotop| Component | Minimum | Recommended | Notes |
|---|---|---|---|
| Linux Kernel | 5.15 | 6.6+ | EEVDF scheduler, modern eBPF |
| bpftrace | 0.16 | 0.20+ | BTF support, newer features |
| bcc-tools | 0.25 | 0.28+ | Latest BPF features |
| perf | matches kernel | - | Install linux-tools-$(uname -r) |
| iproute2 | 6.0 | 6.7+ | netkit, newer tc features |
Key kernel features by version:
| Version | Feature |
|---|---|
| 5.15+ | io_uring maturity, BTF by default |
| 6.1+ | MGLRU, improved memory management |
| 6.4+ | Per-VMA locks, reduced mmap contention |
| 6.6+ | EEVDF scheduler (replaces CFS) |
| 6.7+ | Netkit stable |
| 6.9+ | BPF Arena, new kfuncs |
| 6.12+ | sched_ext merged, PREEMPT_RT mainline |
| Source | Focus | Link |
|---|---|---|
| Brendan Gregg | Performance methodology, eBPF | brendangregg.com |
| Julia Evans | Debugging, Linux internals | jvns.ca |
| Netflix Tech Blog | Production performance | netflixtechblog.com |
| Cloudflare Blog | Network performance, eBPF | blog.cloudflare.com |
| Meta Engineering | eBPF at scale, kernel | engineering.fb.com |
| Dan Luu | Systems analysis, measurement | danluu.com |
Curated extracts from these sources with actionable insights:
- Linux Performance Toolkit - Brendan Gregg's methodologies (USE/TSA/RED), perf, flame graphs, off-CPU
- Netflix Performance Playbook - ZGC, flame graphs, load shedding
- Cloudflare Network Performance - TCP tuning, XDP, DDoS
- Meta eBPF & Systems Engineering - Katran, Strobelight, sched_ext
- Dan Luu Systems Insights - Latency, measurement, caching
- Julia Evans Systems Debugging - strace, debugging methodology
- BCC Tools - eBPF-based Linux tools
- bpftrace - High-level tracing language
- perf-tools - Performance analysis tools
- FlameGraph - Stack trace visualizer
- sched_ext - BPF schedulers