feat(proxy): scan url-encoded phantom tokens in form bodies by nnemirovsky · Pull Request #40 · nnemirovsky/sluice

nnemirovsky · 2026-05-11T10:40:21Z

Summary

Pass 2 scoped replacement now matches both the literal phantom form SLUICE_PHANTOM:<name> and the URL-encoded form SLUICE_PHANTOM%3A<name>, and substitutes a URL-encoded real value so form-urlencoded bodies stay well-formed.
Pass 3 unbound-phantom strip mirrors every phantom with its URL-encoded variant and the regex fallback covers both forms.
The streaming request reader applies the same dual-form swap and sizes its holdback buffer against the longer encoded form so a phantom that straddles a read boundary is preserved.
Added focused unit tests for the form-urlencoded body, the URL query string, and the unbound-phantom strip path.

Why

Anthropic and Google OAuth refresh POSTs use application/x-www-form-urlencoded. The agent emits refresh_token=SLUICE_PHANTOM:anthropic_oauth.refresh, the Go form encoder rewrites it to SLUICE_PHANTOM%3Aanthropic_oauth.refresh, and the previous scanner only checked the literal form. The upstream got the phantom verbatim, returned invalid_grant, and the agent fell back to a fresh interactive OAuth on every refresh cycle. With both forms in scope the swap completes transparently.

Test plan

go test ./internal/proxy/... (611 tests)
go test ./... (2470 tests across 13 packages)
gofumpt -l internal/ clean
golangci-lint run ./internal/... clean
Deploy to dev server, observe successful OAuth refresh round-trip with phantoms intact in vault

The pass-2 scoped phantom replacement only matched the literal form SLUICE_PHANTOM:<name>, so phantom tokens posted in application/x-www-form-urlencoded bodies (Anthropic and Google OAuth refresh, for example) passed through to the upstream as SLUICE_PHANTOM%3A<name>. The provider returned invalid_grant and the agent dropped back to a fresh interactive OAuth on every refresh. The injection pass, the unbound-phantom strip, and the streaming reader now scan both forms and substitute the URL-encoded secret so the wire body stays valid form data. The holdback buffer in the streaming reader sizes against the longer encoded form so a phantom that straddles a chunk boundary cannot slip through.

Copilot

Pull request overview

This PR extends the proxy’s phantom-token scanning/swapping logic to correctly handle URL-encoded phantom tokens (notably SLUICE_PHANTOM%3A...) so that application/x-www-form-urlencoded request bodies, URL query strings, and streaming bodies don’t leak unswapped phantoms to upstream services.

Changes:

Added explicit support for URL-encoded phantom prefixes and updated the fallback strip regex to match both literal and encoded forms.
Updated request processing (buffered + streaming) to replace URL-encoded phantoms with URL-encoded real values and to strip unbound phantoms in both forms.
Added unit tests covering form-urlencoded bodies, URL-encoded query swapping, and stripping of unbound URL-encoded phantoms.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`internal/proxy/phantom.go`	Adds URL-encoded phantom prefix and expands strip regex to match literal + encoded forms.
`internal/proxy/phantom_strip.go`	Mirrors provider-listed phantom tokens with URL-encoded variants for stripping; expands prefix detection for regex fallback.
`internal/proxy/addon.go`	Updates request/stream swapping logic to detect and replace both literal and URL-encoded phantom forms (body/headers/query/path).
`internal/proxy/addon_test.go`	Adds focused tests for form-urlencoded bodies, URL-encoded query swapping, and URL-encoded unbound-phantom stripping.

Comments suppressed due to low confidence (1)

internal/proxy/addon.go:657

The PR adds URL-path scanning/swapping for URL-encoded phantom prefixes, but tests only cover form-urlencoded bodies and RawQuery. Adding a focused test for a phantom in the URL path (including a secret with spaces) would help catch incorrect escaping (e.g., '+' vs '%20') and ensure RawPath/Path handling remains stable.

	// Pass 2+3 on URL path.
	if rawP := f.Request.URL.Path; bytesContainsAnyPhantomPrefix([]byte(rawP)) {
		f.Request.URL.Path = string(
			a.swapPhantomBytes([]byte(rawP), pairs, host, port, "URL path"),
		)
		f.Request.URL.RawPath = ""
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Hot-path response to Copilot review on #40: 1. swapPhantomBytes now chooses between url.QueryEscape and url.PathEscape based on the location label. QueryEscape encodes a space as '+', which is correct for form bodies and URL query strings but corrupts URL path substitutions where a literal '+' is its own character. Paths get PathEscape; bodies, queries, and headers keep QueryEscape. 2. phantomPair gains an encodedPhantom byte slice, populated once at pair-construction time via encodePhantomForPair. The hot scans (swapPhantomBytes, swapPhantomHeaders, phantomSwapReader, maxPhantomLen) read the precomputed bytes instead of recomputing url.QueryEscape on every request, header, and stream chunk. The encoded secret is still computed on demand, but only when the encoded phantom is actually present in the data, so the no-match case is now allocation-free. 3. Added unit tests for the path/query escape divergence and the QueryEscape symmetry case so the location-specific behavior stays pinned.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

RFC 3986 §2.1 makes percent-encoded hex digits case-insensitive. Go's url.QueryEscape always emits uppercase, but a third-party client can emit lowercase, so a scan that only matched %3A would miss SLUICE_PHANTOM%3a... and let the phantom leak. Adds a urlEncodedPhantomPrefixLower constant and an encodedPhantomLower field on phantomPair, populated once at pair-construction time. The swapPhantomBytes, swapPhantomHeaders, streaming reader, prefix detector, and unbound-strip path all check both casings. The replacement secret is escaped once on first hit and reused across both casing branches so the cost stays linear in number-of-encoded-forms, not pairs. Adds tests for the lowercase form on both the bound-swap and unbound- strip paths.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

…dary Adds TestStreamRequestModifier_UrlEncodedPhantomSpanningReads, which places the url-encoded phantom in the middle of a 32 KiB body so it straddles a read chunk. Verifies the holdback buffer in phantomSwapReader is sized for the encoded form (two bytes longer than the literal) so an encoded phantom on a boundary cannot slip through unswapped. Pins the maxPhantomLen change from e92001c (encoded-aware sizing) against regression.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Plus byte-level escape helpers so the secret never crosses into an immutable Go string. QUIC and WebSocket each carry a per-pair phantom swap that previously only matched the literal SLUICE_PHANTOM:<name> form. With the new precomputed encodedPhantom and encodedPhantomLower fields available on phantomPair, the QUIC replacePhantomInHeaders, replacePhantomInBody, and the WebSocket relayFrames text-frame swap now also match SLUICE_PHANTOM%3A<name> and SLUICE_PHANTOM%3a<name>, and trigger unbound-phantom stripping when any encoded prefix is present. Without this, an HTTP/3 OAuth refresh or a form-encoded WebSocket text payload would have leaked the encoded phantom unchanged. The new queryEscapeBytes and pathEscapeBytes helpers are byte-in, byte-out URL encoders that mirror net/url.{Query,Path}Escape's output rules. SwapPhantomBytes, swapPhantomHeaders, phantomSwapReader.Read, the QUIC swap, and the WebSocket swap all use them instead of `url.QueryEscape(string(secret.Bytes()))`, so the secret stays in byte slices that the caller can zero via SecureBytes.Release(). The unreserved character sets follow RFC 3986: query-component (alpha / digit / - _ . ~) plus space->+, and path-segment (the same set plus sub-delims $ & + , ; = : @). phantom_strip.go now reuses encodePhantomForPair and encodePhantomLowerForPair instead of duplicating the URL-encoding rule inline, and drops the net/url import.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

…ctor encodePhantomForPair: a pre-scan via phantomNeedsQueryEscape returns nil without any allocation when every byte in the phantom is already in the unreserved-for-query-component set. Skips the byte->string copy that url.QueryEscape would otherwise produce for the no-op case. encodePhantomLowerForPair: a pre-scan returns nil without allocating when no percent-escape in the encoded phantom contains an uppercase A-F hex digit, so the "nothing to lowercase" case (notably OAuth JWT phantoms with no upper-hex escapes) is allocation-free. swapPhantomBytes: the path-vs-query escape choice is now driven by an explicit pathContext bool parameter instead of comparing the human-readable location label. Callers in Request and the two test helpers pass pathContext directly, so the type system enforces the escape choice and a typo in the location string can no longer silently flip path encoding to query encoding (or vice versa).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

The previous comment implied the no-allocation fast path covered OAuth JWT phantoms and phantoms whose only escape is %3A. The %3A case still differs from %3a, so the fast path does NOT skip the allocation there. It only fires when every percent-escape uses 0-9 digits exclusively (e.g. %20%21%30). Aligned the comment with the actual behavior.

Copilot