Performance Engineering Handbook

Performance engineering, debugging, and tuning across the full stack: the Linux kernel and CPU microarchitecture, profiling and tracing, language runtimes, storage engines, databases, distributed systems, and the data structures that decide whether any of it is fast.

It started as a Linux performance toolkit. It now spans ~50 chapters and guides, from perf and eBPF up to vector indexes, LSM compaction, and HFT C++.

The Linux chapters target kernel 6.6+ (EEVDF scheduler, modern eBPF). The data-structure, database, and architecture chapters are OS-agnostic.

Last Updated: 2026-06

How to Use This Handbook

Start with the 60-second checklist (see Linux Quick Start) for quick triage
Use the Navigation below to find a topic, grouped into six parts
Check the cheatsheets for copy-paste commands
Refer to curated sources for deep dives on specific areas

For investigation workflow:

Symptom → 60-Second Analysis → Identify resource bottleneck → Deep dive chapter

Before reaching for any tool, read 00b - Observability Boundaries. It encodes the single most important principle in this handbook: every layer's metrics lie by omission about the layer below it. A 99% PG buffer-hit ratio can coexist with a saturated NVMe; a JVM at 2 GB heap can be OOM-killed at 8 GB RSS; a container at 80% memory can be 50% reclaimable cache. If you skip this chapter, you will trust one layer and miss the truth in the next.

Navigation

Part I — Triage & Methodology

Where to start, and how to not fool yourself.

Chapter	Description
00 - Troubleshooting Framework	USE/RED methods, 60-second checklist, decision trees
00b - Observability Boundaries	Read first. Layer stack, "stats lie" anti-patterns, triangulation pattern, tool-by-layer cheat sheet
Bryan Cantrill Debugging Methodology	Questions-first, state preservation, systematic elimination

Part II — Linux Systems Performance

Kernel, CPU, memory, disk, network: the original core.

Chapter	Description
01 - Modern CLI Replacements	Rust/Go tools replacing classic Unix utils
02 - System Monitoring	CPU, memory, process monitoring
03 - Network Analysis	Traffic analysis, DNS, HTTP testing
04 - Disk & Storage	I/O benchmarking, filesystem tools
05 - Performance Profiling	perf, flame graphs, profilers
06 - eBPF & Tracing	BCC tools, bpftrace, ftrace, sched_ext
07 - Containers & K8s	Docker, Kubernetes debugging
08 - Kernel Tuning	sysctl, EEVDF scheduler, memory, I/O
09 - Network Tuning	TCP, BBR, io_uring, NUMA networking
11 - GPU & HPC	GPU profiling, MIG, HPC tracing
12 - Observability & Metrics	Prometheus, Grafana, OpenTelemetry
15 - Memory Subsystem	NUMA, huge pages, memory profiling
16 - Scheduler & Interrupts	CPU scheduling, context switching
17 - Ftrace Production	Function tracing, trace-cmd
18 - VDSO & Clock Source	Time syscalls, TSC, cloud VM performance
18 - Off-CPU Analysis	Wall-clock profiling, blocking detection, load-scaling bottlenecks
eBPF Performance Overhead	Hook overhead, map types, production deployment
Container Debugging Patterns	cAdvisor, GOMAXPROCS, PSI, cgroup v2 debugging
Scheduler Debugging Deep Dive	CFS bugs, run-queue attribution, Perfetto, noisy-neighbor detection
TCP Edge Cases & Load Balancers	SYN retry, LB buffering, timeout hierarchies

Part III — Runtimes & Latency

JVM behavior, and how tail latency hides from you.

Chapter	Description
10 - Java/JVM	JVM profiling and tuning (ZGC, async-profiler, GC analysis)
13 - Latency Analysis	Tail latency, coordinated omission, P99
Coordinated Omission Guide	Load-testing correctness, wrk2, timestamp injection

Part IV — Databases & Storage Engines

Query plans, storage internals, and what breaks in production.

Chapter	Description
14 - Database Profiling	PostgreSQL, MySQL, query optimization, cross-layer PG profiling
Database Production Debugging	Hot partitions, cache pollution, admission control, lock analysis
19 - Storage Engine Patterns	LMDB, RocksDB, LSM, columnar, vectorized engines
23 - Database Scaling	Sharded ID generation, zero-downtime reshard, connection pools, Trino
35 - LSM Compaction Strategies	Leveled/STCS/Universal/FIFO/TWCS, Dostoevsky, Monkey, tombstones, RUM conjecture

Part V — Distributed Systems & Data Architectures

Messaging, streaming, caching, and keeping state consistent across nodes.

Chapter	Description
20 - Resilience Patterns	Circuit breakers, bulkheads, retry/backoff tuning, timeout budgets
21 - Caching Patterns	Redis memory encoding, stampede protection, Redis→MySQL/Kafka migrations
22 - Kafka & Messaging	Partitioning, replication, producer/broker config, scaling at volume
24 - Real-time Analytics Architectures	Uber AresDB/AthenaX, streaming SQL, RisingWave, distributed time lineage
25 - Big Data & ML Platforms	Lakehouse evolution, distributed training, inference throughput, billion-scale vector search
CRDTs: Lock-Free Distributed State	G-Counter, PN-Counter, OR-Set, LWW-Register with delta sync

Part VI — Low-Latency Engineering & Performance Data Structures

The structures and tricks behind fast databases, search, and trading systems.

Chapter	Description
26 - C++ HFT Optimization Patterns	SwissTable/F14, branchless, hugepages, FastQueue, kernel bypass, 30 LLM-actionable levers
27 - Compact Integer Sets	Roaring bitmaps, Elias-Fano/PEF, Bloom/Cuckoo/XOR/Binary Fuse, ART — decision matrix + heuristics
28 - Probabilistic Sketches	HLL/UltraLogLog, Count-Min, t-digest/DDSketch/KLL, Theta, MinHash/LSH — cardinality, frequency, quantile
29 - Vector ANN Indexes	HNSW, IVF-PQ, DiskANN, SCANN, filtered ANN, hybrid search — recall vs latency, pgvector/FAISS/Qdrant
30 - Hash Tables at Scale	SwissTable/F14/hashbrown, Robin Hood, Cuckoo, MPHF/PTHash/RecSplit, hash DoS, concurrent maps
31 - Columnar Encoding Cookbook	Dict, RLE, FOR/PFOR, Gorilla/Chimp/ALP, FSST, Zstd dict — Parquet/ORC/Arrow/ClickHouse
32 - Ordered/Range/Spatial Structures	Skip list, Bw-tree, Fenwick, segment tree, BKD, R-tree, H3/S2, space-filling curves
33 - Compressed Strings & Tries	FST, FSST, Patricia, DAWG, front-coding, FM-index, wavelet trees, LOUDS — Lucene/Tantivy/DuckDB
34 - Learned Indexes	RMI, PGM-index, ALEX, learned Bloom — honest 2026 verdict vs B-tree/ART

Cheatsheets

Cheatsheet	Description
One-Liners	Quick diagnostic commands by problem type
Sysctl Reference	Key kernel parameters
VDSO/Clock Troubleshooting	Quick detection and fixes for time-related performance

Linux Quick Start

60-Second Analysis

From Brendan Gregg's Linux Performance Analysis in 60,000 Milliseconds:

uptime                           # load averages
dmesg | tail                     # kernel errors
vmstat 1                         # system-wide stats
mpstat -P ALL 1                  # CPU balance
pidstat 1                        # process CPU
iostat -xz 1                     # disk I/O
free -m                          # memory usage
sar -n DEV 1                     # network I/O
sar -n TCP,ETCP 1               # TCP stats
top                              # overview

Classic -> Modern Replacements

Classic	Modern	Why
`ls`	`eza`	Git status, icons, tree view
`cat`	`bat`	Syntax highlighting
`find`	`fd`	5x faster, simpler syntax
`grep`	`ripgrep`	10x faster, .gitignore aware
`du`	`dust`	Visual bars
`df`	`duf`	Clean tables
`top`	`btop`	Dashboard UI
`dig`	`dog`/`doggo`	DoH/DoT, colors
`curl`	`xh`	Human-friendly HTTP
`sed`	`sd`	Sane regex
`cd`	`zoxide`	Frecency-based jump

Performance Stack

Application  ->  async-profiler (Java), py-spy (Python), rbspy (Ruby)
     |
Userspace    ->  perf, valgrind, heaptrack
     |
Syscalls     ->  strace, ltrace
     |
Kernel       ->  eBPF/BCC, bpftrace, ftrace
     |
Hardware     ->  perf stat, turbostat, intel_gpu_top

Network Stack

L7 (HTTP)    ->  httpie, xh, hey, wrk, k6, vegeta
     |
L4 (TCP)     ->  ss, netstat, tcpdump, termshark
     |
L3 (IP)      ->  mtr, traceroute, ping, gping
     |
L2 (Link)    ->  ethtool, ip link

Quick Install (Debian/Ubuntu)

# Modern CLI tools
sudo apt install ripgrep fd-find bat eza fzf btop git-delta zoxide duf gping

# Performance tools
sudo apt install linux-tools-common linux-tools-$(uname -r) bpfcc-tools bpftrace

# Network tools
sudo apt install mtr-tiny tcpdump nmap iperf3 netcat-openbsd

# Monitoring
sudo apt install sysstat htop iotop

Version Requirements

Component	Minimum	Recommended	Notes
Linux Kernel	5.15	6.6+	EEVDF scheduler, modern eBPF
bpftrace	0.16	0.20+	BTF support, newer features
bcc-tools	0.25	0.28+	Latest BPF features
perf	matches kernel	-	Install linux-tools-$(uname -r)
iproute2	6.0	6.7+	netkit, newer tc features

Key kernel features by version:

Version	Feature
5.15+	io_uring maturity, BTF by default
6.1+	MGLRU, improved memory management
6.4+	Per-VMA locks, reduced mmap contention
6.6+	EEVDF scheduler (replaces CFS)
6.7+	Netkit stable
6.9+	BPF Arena, new kfuncs
6.12+	sched_ext merged, PREEMPT_RT mainline

Curated Sources

Essential Reading

Source	Focus	Link
Brendan Gregg	Performance methodology, eBPF	brendangregg.com
Julia Evans	Debugging, Linux internals	jvns.ca
Netflix Tech Blog	Production performance	netflixtechblog.com
Cloudflare Blog	Network performance, eBPF	blog.cloudflare.com
Meta Engineering	eBPF at scale, kernel	engineering.fb.com
Dan Luu	Systems analysis, measurement	danluu.com

In This Repository

Curated extracts from these sources with actionable insights:

Linux Performance Toolkit - Brendan Gregg's methodologies (USE/TSA/RED), perf, flame graphs, off-CPU
Netflix Performance Playbook - ZGC, flame graphs, load shedding
Cloudflare Network Performance - TCP tuning, XDP, DDoS
Meta eBPF & Systems Engineering - Katran, Strobelight, sched_ext
Dan Luu Systems Insights - Latency, measurement, caching
Julia Evans Systems Debugging - strace, debugging methodology

Tools & References

BCC Tools - eBPF-based Linux tools
bpftrace - High-level tracing language
perf-tools - Performance analysis tools
FlameGraph - Stack trace visualizer
sched_ext - BPF schedulers

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
cheatsheets		cheatsheets
00-troubleshooting-framework.md		00-troubleshooting-framework.md
00b-observability-boundaries.md		00b-observability-boundaries.md
01-modern-cli-replacements.md		01-modern-cli-replacements.md
02-system-monitoring.md		02-system-monitoring.md
03-network-analysis.md		03-network-analysis.md
04-disk-storage.md		04-disk-storage.md
05-performance-profiling.md		05-performance-profiling.md
06-ebpf-tracing.md		06-ebpf-tracing.md
07-containers-k8s.md		07-containers-k8s.md
08-kernel-tuning.md		08-kernel-tuning.md
09-network-tuning.md		09-network-tuning.md
10-java-jvm.md		10-java-jvm.md
11-gpu-hpc.md		11-gpu-hpc.md
12-observability-metrics.md		12-observability-metrics.md
13-latency-analysis.md		13-latency-analysis.md
14-database-profiling.md		14-database-profiling.md
15-memory-subsystem.md		15-memory-subsystem.md
16-scheduler-interrupts.md		16-scheduler-interrupts.md
17-ftrace-production.md		17-ftrace-production.md
18-off-cpu-analysis.md		18-off-cpu-analysis.md
18-vdso-clock-source-tuning.md		18-vdso-clock-source-tuning.md
19-storage-engine-patterns.md		19-storage-engine-patterns.md
20-resilience-patterns.md		20-resilience-patterns.md
21-caching-patterns.md		21-caching-patterns.md
22-kafka-messaging.md		22-kafka-messaging.md
23-database-scaling.md		23-database-scaling.md
24-realtime-analytics-architectures.md		24-realtime-analytics-architectures.md
25-big-data-ml-platforms.md		25-big-data-ml-platforms.md
26-cpp-hft-optimization-patterns.md		26-cpp-hft-optimization-patterns.md
27-compact-integer-sets.md		27-compact-integer-sets.md
28-probabilistic-sketches.md		28-probabilistic-sketches.md
29-vector-ann-indexes.md		29-vector-ann-indexes.md
30-hash-tables-at-scale.md		30-hash-tables-at-scale.md
31-columnar-encoding-cookbook.md		31-columnar-encoding-cookbook.md
32-ordered-range-spatial-structures.md		32-ordered-range-spatial-structures.md
33-compressed-strings-and-tries.md		33-compressed-strings-and-tries.md
34-learned-indexes.md		34-learned-indexes.md
35-lsm-compaction-strategies.md		35-lsm-compaction-strategies.md
README.md		README.md
TODO.md		TODO.md
bryan-cantrill-debugging-methodology.md		bryan-cantrill-debugging-methodology.md
cloudflare-network-performance.md		cloudflare-network-performance.md
container-debugging-patterns.md		container-debugging-patterns.md
coordinated-omission-guide.md		coordinated-omission-guide.md
crdt-lock-free-distributed-state.md		crdt-lock-free-distributed-state.md
dan-luu-systems-insights.md		dan-luu-systems-insights.md
database-production-debugging.md		database-production-debugging.md
ebpf-performance-overhead-guide.md		ebpf-performance-overhead-guide.md
julia-evans-systems-debugging.md		julia-evans-systems-debugging.md
linux-performance-toolkit.md		linux-performance-toolkit.md
meta-ebpf-systems-engineering.md		meta-ebpf-systems-engineering.md
netflix-performance-playbook.md		netflix-performance-playbook.md
scheduler-debugging-deep-dive.md		scheduler-debugging-deep-dive.md
tcp-edge-cases-and-load-balancer-behavior.md		tcp-edge-cases-and-load-balancer-behavior.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performance Engineering Handbook

How to Use This Handbook

Navigation

Part I — Triage & Methodology

Part II — Linux Systems Performance

Part III — Runtimes & Latency

Part IV — Databases & Storage Engines

Part V — Distributed Systems & Data Architectures

Part VI — Low-Latency Engineering & Performance Data Structures

Cheatsheets

Linux Quick Start

60-Second Analysis

Classic -> Modern Replacements

Performance Stack

Network Stack

Quick Install (Debian/Ubuntu)

Version Requirements

Curated Sources

Essential Reading

In This Repository

Tools & References

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Performance Engineering Handbook

How to Use This Handbook

Navigation

Part I — Triage & Methodology

Part II — Linux Systems Performance

Part III — Runtimes & Latency

Part IV — Databases & Storage Engines

Part V — Distributed Systems & Data Architectures

Part VI — Low-Latency Engineering & Performance Data Structures

Cheatsheets

Linux Quick Start

60-Second Analysis

Classic -> Modern Replacements

Performance Stack

Network Stack

Quick Install (Debian/Ubuntu)

Version Requirements

Curated Sources

Essential Reading

In This Repository

Tools & References

Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages