feat(proxy): scan url-encoded phantom tokens in form bodies#40
Open
nnemirovsky wants to merge 7 commits into
Open
feat(proxy): scan url-encoded phantom tokens in form bodies#40nnemirovsky wants to merge 7 commits into
nnemirovsky wants to merge 7 commits into
Conversation
The pass-2 scoped phantom replacement only matched the literal form SLUICE_PHANTOM:<name>, so phantom tokens posted in application/x-www-form-urlencoded bodies (Anthropic and Google OAuth refresh, for example) passed through to the upstream as SLUICE_PHANTOM%3A<name>. The provider returned invalid_grant and the agent dropped back to a fresh interactive OAuth on every refresh. The injection pass, the unbound-phantom strip, and the streaming reader now scan both forms and substitute the URL-encoded secret so the wire body stays valid form data. The holdback buffer in the streaming reader sizes against the longer encoded form so a phantom that straddles a chunk boundary cannot slip through.
There was a problem hiding this comment.
Pull request overview
This PR extends the proxy’s phantom-token scanning/swapping logic to correctly handle URL-encoded phantom tokens (notably SLUICE_PHANTOM%3A...) so that application/x-www-form-urlencoded request bodies, URL query strings, and streaming bodies don’t leak unswapped phantoms to upstream services.
Changes:
- Added explicit support for URL-encoded phantom prefixes and updated the fallback strip regex to match both literal and encoded forms.
- Updated request processing (buffered + streaming) to replace URL-encoded phantoms with URL-encoded real values and to strip unbound phantoms in both forms.
- Added unit tests covering form-urlencoded bodies, URL-encoded query swapping, and stripping of unbound URL-encoded phantoms.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
internal/proxy/phantom.go |
Adds URL-encoded phantom prefix and expands strip regex to match literal + encoded forms. |
internal/proxy/phantom_strip.go |
Mirrors provider-listed phantom tokens with URL-encoded variants for stripping; expands prefix detection for regex fallback. |
internal/proxy/addon.go |
Updates request/stream swapping logic to detect and replace both literal and URL-encoded phantom forms (body/headers/query/path). |
internal/proxy/addon_test.go |
Adds focused tests for form-urlencoded bodies, URL-encoded query swapping, and URL-encoded unbound-phantom stripping. |
Comments suppressed due to low confidence (1)
internal/proxy/addon.go:657
- The PR adds URL-path scanning/swapping for URL-encoded phantom prefixes, but tests only cover form-urlencoded bodies and RawQuery. Adding a focused test for a phantom in the URL path (including a secret with spaces) would help catch incorrect escaping (e.g., '+' vs '%20') and ensure RawPath/Path handling remains stable.
// Pass 2+3 on URL path.
if rawP := f.Request.URL.Path; bytesContainsAnyPhantomPrefix([]byte(rawP)) {
f.Request.URL.Path = string(
a.swapPhantomBytes([]byte(rawP), pairs, host, port, "URL path"),
)
f.Request.URL.RawPath = ""
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Hot-path response to Copilot review on #40: 1. swapPhantomBytes now chooses between url.QueryEscape and url.PathEscape based on the location label. QueryEscape encodes a space as '+', which is correct for form bodies and URL query strings but corrupts URL path substitutions where a literal '+' is its own character. Paths get PathEscape; bodies, queries, and headers keep QueryEscape. 2. phantomPair gains an encodedPhantom byte slice, populated once at pair-construction time via encodePhantomForPair. The hot scans (swapPhantomBytes, swapPhantomHeaders, phantomSwapReader, maxPhantomLen) read the precomputed bytes instead of recomputing url.QueryEscape on every request, header, and stream chunk. The encoded secret is still computed on demand, but only when the encoded phantom is actually present in the data, so the no-match case is now allocation-free. 3. Added unit tests for the path/query escape divergence and the QueryEscape symmetry case so the location-specific behavior stays pinned.
RFC 3986 §2.1 makes percent-encoded hex digits case-insensitive. Go's url.QueryEscape always emits uppercase, but a third-party client can emit lowercase, so a scan that only matched %3A would miss SLUICE_PHANTOM%3a... and let the phantom leak. Adds a urlEncodedPhantomPrefixLower constant and an encodedPhantomLower field on phantomPair, populated once at pair-construction time. The swapPhantomBytes, swapPhantomHeaders, streaming reader, prefix detector, and unbound-strip path all check both casings. The replacement secret is escaped once on first hit and reused across both casing branches so the cost stays linear in number-of-encoded-forms, not pairs. Adds tests for the lowercase form on both the bound-swap and unbound- strip paths.
…dary Adds TestStreamRequestModifier_UrlEncodedPhantomSpanningReads, which places the url-encoded phantom in the middle of a 32 KiB body so it straddles a read chunk. Verifies the holdback buffer in phantomSwapReader is sized for the encoded form (two bytes longer than the literal) so an encoded phantom on a boundary cannot slip through unswapped. Pins the maxPhantomLen change from e92001c (encoded-aware sizing) against regression.
Plus byte-level escape helpers so the secret never crosses into an
immutable Go string.
QUIC and WebSocket each carry a per-pair phantom swap that previously
only matched the literal SLUICE_PHANTOM:<name> form. With the new
precomputed encodedPhantom and encodedPhantomLower fields available on
phantomPair, the QUIC replacePhantomInHeaders, replacePhantomInBody, and
the WebSocket relayFrames text-frame swap now also match
SLUICE_PHANTOM%3A<name> and SLUICE_PHANTOM%3a<name>, and trigger
unbound-phantom stripping when any encoded prefix is present. Without
this, an HTTP/3 OAuth refresh or a form-encoded WebSocket text payload
would have leaked the encoded phantom unchanged.
The new queryEscapeBytes and pathEscapeBytes helpers are byte-in,
byte-out URL encoders that mirror net/url.{Query,Path}Escape's output
rules. SwapPhantomBytes, swapPhantomHeaders, phantomSwapReader.Read,
the QUIC swap, and the WebSocket swap all use them instead of
`url.QueryEscape(string(secret.Bytes()))`, so the secret stays in byte
slices that the caller can zero via SecureBytes.Release(). The
unreserved character sets follow RFC 3986: query-component (alpha /
digit / - _ . ~) plus space->+, and path-segment (the same set plus
sub-delims $ & + , ; = : @).
phantom_strip.go now reuses encodePhantomForPair and
encodePhantomLowerForPair instead of duplicating the URL-encoding rule
inline, and drops the net/url import.
…ctor encodePhantomForPair: a pre-scan via phantomNeedsQueryEscape returns nil without any allocation when every byte in the phantom is already in the unreserved-for-query-component set. Skips the byte->string copy that url.QueryEscape would otherwise produce for the no-op case. encodePhantomLowerForPair: a pre-scan returns nil without allocating when no percent-escape in the encoded phantom contains an uppercase A-F hex digit, so the "nothing to lowercase" case (notably OAuth JWT phantoms with no upper-hex escapes) is allocation-free. swapPhantomBytes: the path-vs-query escape choice is now driven by an explicit pathContext bool parameter instead of comparing the human-readable location label. Callers in Request and the two test helpers pass pathContext directly, so the type system enforces the escape choice and a typo in the location string can no longer silently flip path encoding to query encoding (or vice versa).
The previous comment implied the no-allocation fast path covered OAuth JWT phantoms and phantoms whose only escape is %3A. The %3A case still differs from %3a, so the fast path does NOT skip the allocation there. It only fires when every percent-escape uses 0-9 digits exclusively (e.g. %20%21%30). Aligned the comment with the actual behavior.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SLUICE_PHANTOM:<name>and the URL-encoded formSLUICE_PHANTOM%3A<name>, and substitutes a URL-encoded real value so form-urlencoded bodies stay well-formed.Why
Anthropic and Google OAuth refresh POSTs use
application/x-www-form-urlencoded. The agent emitsrefresh_token=SLUICE_PHANTOM:anthropic_oauth.refresh, the Go form encoder rewrites it toSLUICE_PHANTOM%3Aanthropic_oauth.refresh, and the previous scanner only checked the literal form. The upstream got the phantom verbatim, returnedinvalid_grant, and the agent fell back to a fresh interactive OAuth on every refresh cycle. With both forms in scope the swap completes transparently.Test plan
go test ./internal/proxy/...(611 tests)go test ./...(2470 tests across 13 packages)gofumpt -l internal/cleangolangci-lint run ./internal/...clean