Performance

Pipelock adds microseconds of overhead per request. The proxy is I/O bound (waiting for upstream responses), not CPU bound. For the request-side URL scanning hot path, CPU is never the bottleneck. Response scanning and MCP scanning on large payloads can use measurable CPU at high throughput (see tables below).

All numbers from Go benchmarks on AMD Ryzen 7 7800X3D (8 cores / 16 threads) / Go 1.25 / Linux. Run make bench to reproduce on your hardware. See benchmarks.md for raw ns/op data.

Scanning Latency (single request)

URL Scanning (fetch/forward proxy hot path)

11-layer pipeline: scheme, CRLF injection, path traversal, blocklist, DLP, path entropy, subdomain entropy, SSRF, rate limit, URL length, data budget.

Operation	Latency	Throughput (1 core)
Full pipeline (allowed URL)	~32 μs	~31,000/sec
Blocklist block (early exit)	~2 μs	~500,000/sec
DLP pattern match (47 patterns, pre-filtered)	~8 μs	~130,000/sec
DLP pre-filter only (clean text, zero alloc)	~400 ns	~2,500,000/sec
Entropy detection	~58 μs	~17,000/sec
Complex URL (ports, query params)	~60 μs	~17,000/sec

MCP Scanning (tool call/response inspection)

JSON-RPC parsing + text extraction + prompt injection pattern matching.

Operation	Latency	Throughput (1 core)
Clean tool response	~78 μs	~13,000/sec
Injection detected (early exit)	~36 μs	~28,000/sec
Text extraction	~2.5 μs	~400,000/sec

Response Scanning (fetched content injection detection)

Pattern matching against 25 prompt injection patterns (including 6 state/control patterns and 4 CJK-language patterns) on fetched page content.

Operation	Latency	Throughput (1 core)
Short clean text (~90B)	~76 μs	~13,000/sec
10KB clean text	~8.4 ms	~120/sec
Injection detected (early exit)	~42 μs	~24,000/sec
State/control clean	~134 μs	~7,500/sec

The keyword pre-filter (added in v1.3.0) short-circuits regex evaluation when no injection keywords are present in the normalized text. This cut clean-text latency by 29%, large-content latency by 27%, and injection-detected latency by 3.1x (early keyword match skips later normalization passes). The 10KB response scan remains the current ceiling due to 6 sequential normalization passes. Content size tiering (skipping passes 3-6 for large content) is planned.

Supporting Operations

Operation	Latency
Unicode normalization (DLP mode)	~950 ns
Unicode normalization (matching mode)	~1.3 μs
Unicode normalization (tool text mode)	~2.0 μs
Shannon entropy calculation	~2.2 μs
Domain matching (exact)	~50 ns
Domain matching (wildcard)	~53 ns

Concurrent Scaling

The scanner's core detection pipeline (scheme, blocklist, DLP, entropy, SSRF) is stateless per request with no shared mutable state. Config reads use atomic pointer swap. Rate limiting and data budget tracking use per-scanner mutexes, but these are low-contention (one lock acquisition per request). Benchmarks below are run with rate limiting and data budget disabled to isolate scanning throughput.

Parallel throughput (`b.RunParallel`)

These benchmarks run across all available goroutines simultaneously, measuring total operations per second as parallelism increases.

URL Scanning:

GOMAXPROCS	ns/op	Throughput	Scaling vs 1
1	44,135	22,700/sec	1.0x
2	23,052	43,400/sec	1.9x
4	12,356	80,900/sec	3.6x
8	7,177	139,300/sec	6.1x
16	6,500	153,800/sec	6.8x

DLP Block (early exit):

GOMAXPROCS	ns/op	Throughput	Scaling vs 1
1	7,625	131,100/sec	1.0x
2	4,017	248,900/sec	1.9x
4	2,204	453,700/sec	3.5x
8	1,414	707,200/sec	5.4x
16	1,184	844,600/sec	6.4x

Response Scanning (short content):

GOMAXPROCS	ns/op	Throughput	Scaling vs 1
1	87,818	11,400/sec	1.0x
2	45,767	21,800/sec	1.9x
4	23,978	41,700/sec	3.7x
8	14,628	68,400/sec	6.0x
16	12,900	77,500/sec	6.8x

Response Scanning (10KB content):

GOMAXPROCS	ns/op	Throughput	Scaling vs 1
1	11,780,295	85/sec	1.0x
2	6,657,276	150/sec	1.8x
4	3,093,228	323/sec	3.8x
8	1,898,905	527/sec	6.2x
16	1,928,156	519/sec	6.1x

MCP Scanning (clean response):

GOMAXPROCS	ns/op	Throughput	Scaling vs 1
1	87,764	11,400/sec	1.0x
4	23,540	42,500/sec	3.7x
8	13,442	74,400/sec	6.5x
16	11,510	86,900/sec	7.6x

Blocklist (early exit):

GOMAXPROCS	ns/op	Throughput	Scaling vs 1
1	2,139	467,500/sec	1.0x
2	1,132	883,400/sec	1.9x
4	633	1,580,300/sec	3.4x
8	423	2,364,100/sec	5.1x
16	364	2,747,300/sec	5.9x

Concurrent throughput scaling (goroutine ramp)

Sustained 2-second runs at increasing goroutine counts. Measures total operations completed, not per-goroutine latency.

URL Scan:

Goroutines	Ops/sec	Scaling
1	19,466	1.0x
2	37,122	1.9x
4	67,722	3.5x
8	106,321	5.5x
16	121,337	6.2x
32	115,875	6.0x
64	123,959	6.4x

Response Scan:

Goroutines	Ops/sec	Scaling
1	8,284	1.0x
2	16,135	1.9x
4	31,417	3.8x
8	52,405	6.3x
16	62,776	7.6x
32	66,575	8.0x
64	65,470	7.9x

The pattern: near-linear scaling up to physical core count (8), small gains from hyperthreading (16), then plateau. No degradation past core count. Adding more concurrent agents doesn't slow anything down, you just stop getting additional throughput once all cores are saturated.

HTTP Proxy Overhead

Raw HTTP handler throughput measured with hey against the running proxy.

Concurrency	Requests	Req/sec	P50	P99
50	2,000	43,474	0.5 ms	18.5 ms
200	10,000	102,600	0.7 ms	23.2 ms
500	20,000	97,268	2.0 ms	51.9 ms

This measures HTTP accept/parse/route/respond overhead. Actual scanning latency adds the per-operation costs from the tables above.

CPU Cost at Scale

How much CPU does scanning consume at various request rates? These numbers cover scanning overhead only, not network I/O.

Request-side scanning (URL + MCP)

Request rate	CPU (URL scan)	CPU (MCP scan)
100/sec	0.4% of 1 core	0.9% of 1 core
1,000/sec	3.7% of 1 core	8.9% of 1 core
10,000/sec	37% of 1 core	0.9 cores
100,000/sec	3.7 cores	8.9 cores

Response-side scanning

Request rate	CPU (short ~90B)	CPU (10KB content)
100/sec	0.8% of 1 core	1.2 cores
1,000/sec	8.1% of 1 core	12.1 cores

Response scanning is the most CPU-intensive path. At high throughput with large payloads, it dominates. For request-side scanning only, 1,000 requests per second uses less than 15% of a single CPU core. Network latency (waiting for upstream HTTP responses) dominates total request time by orders of magnitude.

Deployment Sizing

Deployment	Expected load	CPU recommendation
Single developer (local proxy)	1-10 req/sec	Any (negligible overhead)
Team sidecar (per-agent)	10-100 req/sec	0.1 CPU, 64MB RAM
Shared proxy (small org)	100-1,000 req/sec	0.5 CPU, 128MB RAM
Platform deployment	10,000+ req/sec	2+ CPU, 256MB RAM

The binary is ~18MB static (release build with symbol stripping). Memory usage is dominated by the DLP regex compilation (~40MB RSS at idle with default patterns) and scales linearly with concurrent connections, not pattern count.

Design Decisions That Affect Performance

Early exit on block. Blocked URLs short-circuit at the first failing layer. Blocklist hits resolve in ~2μs. DLP matches exit before DNS resolution.

Pre-DNS checks. CRLF injection, path traversal, allowlist, blocklist, and DLP checks all execute before any network call. This prevents secret exfiltration via DNS queries and keeps the fast path fast.

Stateless detection pipeline. Each scan allocates its own working state. The core detection layers (scheme through SSRF) have no shared mutable state, enabling linear scaling with cores. Rate limiting and data budget use per-scanner mutexes but are low-contention.

Fire-and-forget event emission. Webhook events use an async buffered channel. Syslog is UDP. Neither blocks the scanning pipeline.

Atomic config reload. Hot-reload swaps the entire scanner via atomic.Pointer, so scanning never blocks on config changes.

Reproducing These Numbers

# Full benchmark suite (sequential)
make bench

# Parallel scaling (URL scanner)
go test -bench=BenchmarkParallel -benchtime=3s -cpu=1,2,4,8,16 ./internal/scanner/

# Parallel scaling (MCP scanner)
go test -bench=BenchmarkParallel -benchtime=3s -cpu=1,4,8,16 ./internal/mcp/

# Concurrent throughput scaling test (~28s)
PIPELOCK_BENCH_SCALING=1 go test -v -run=TestConcurrentThroughputScaling ./internal/scanner/

# HTTP proxy overhead (requires running pipelock instance)
hey -n 10000 -c 200 http://localhost:8888/health

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance

Scanning Latency (single request)

URL Scanning (fetch/forward proxy hot path)

MCP Scanning (tool call/response inspection)

Response Scanning (fetched content injection detection)

Supporting Operations

Concurrent Scaling

Parallel throughput (`b.RunParallel`)

Concurrent throughput scaling (goroutine ramp)

HTTP Proxy Overhead

CPU Cost at Scale

Request-side scanning (URL + MCP)

Response-side scanning

Deployment Sizing

Design Decisions That Affect Performance

Reproducing These Numbers

Uh oh!

FilesExpand file tree

performance.md

Latest commit

History

performance.md

File metadata and controls

Performance

Scanning Latency (single request)

URL Scanning (fetch/forward proxy hot path)

MCP Scanning (tool call/response inspection)

Response Scanning (fetched content injection detection)

Supporting Operations

Concurrent Scaling

Parallel throughput (b.RunParallel)

Concurrent throughput scaling (goroutine ramp)

HTTP Proxy Overhead

CPU Cost at Scale

Request-side scanning (URL + MCP)

Response-side scanning

Deployment Sizing

Design Decisions That Affect Performance

Reproducing These Numbers

Parallel throughput (`b.RunParallel`)