Skip to content

feat(proxy): scan url-encoded phantom tokens in form bodies#40

Open
nnemirovsky wants to merge 7 commits into
mainfrom
feat/url-encoded-phantom-scan
Open

feat(proxy): scan url-encoded phantom tokens in form bodies#40
nnemirovsky wants to merge 7 commits into
mainfrom
feat/url-encoded-phantom-scan

Conversation

@nnemirovsky
Copy link
Copy Markdown
Owner

Summary

  • Pass 2 scoped replacement now matches both the literal phantom form SLUICE_PHANTOM:<name> and the URL-encoded form SLUICE_PHANTOM%3A<name>, and substitutes a URL-encoded real value so form-urlencoded bodies stay well-formed.
  • Pass 3 unbound-phantom strip mirrors every phantom with its URL-encoded variant and the regex fallback covers both forms.
  • The streaming request reader applies the same dual-form swap and sizes its holdback buffer against the longer encoded form so a phantom that straddles a read boundary is preserved.
  • Added focused unit tests for the form-urlencoded body, the URL query string, and the unbound-phantom strip path.

Why

Anthropic and Google OAuth refresh POSTs use application/x-www-form-urlencoded. The agent emits refresh_token=SLUICE_PHANTOM:anthropic_oauth.refresh, the Go form encoder rewrites it to SLUICE_PHANTOM%3Aanthropic_oauth.refresh, and the previous scanner only checked the literal form. The upstream got the phantom verbatim, returned invalid_grant, and the agent fell back to a fresh interactive OAuth on every refresh cycle. With both forms in scope the swap completes transparently.

Test plan

  • go test ./internal/proxy/... (611 tests)
  • go test ./... (2470 tests across 13 packages)
  • gofumpt -l internal/ clean
  • golangci-lint run ./internal/... clean
  • Deploy to dev server, observe successful OAuth refresh round-trip with phantoms intact in vault

The pass-2 scoped phantom replacement only matched the literal form
SLUICE_PHANTOM:<name>, so phantom tokens posted in
application/x-www-form-urlencoded bodies (Anthropic and Google OAuth
refresh, for example) passed through to the upstream as
SLUICE_PHANTOM%3A<name>. The provider returned invalid_grant and the
agent dropped back to a fresh interactive OAuth on every refresh.

The injection pass, the unbound-phantom strip, and the streaming reader
now scan both forms and substitute the URL-encoded secret so the wire
body stays valid form data. The holdback buffer in the streaming reader
sizes against the longer encoded form so a phantom that straddles a
chunk boundary cannot slip through.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the proxy’s phantom-token scanning/swapping logic to correctly handle URL-encoded phantom tokens (notably SLUICE_PHANTOM%3A...) so that application/x-www-form-urlencoded request bodies, URL query strings, and streaming bodies don’t leak unswapped phantoms to upstream services.

Changes:

  • Added explicit support for URL-encoded phantom prefixes and updated the fallback strip regex to match both literal and encoded forms.
  • Updated request processing (buffered + streaming) to replace URL-encoded phantoms with URL-encoded real values and to strip unbound phantoms in both forms.
  • Added unit tests covering form-urlencoded bodies, URL-encoded query swapping, and stripping of unbound URL-encoded phantoms.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
internal/proxy/phantom.go Adds URL-encoded phantom prefix and expands strip regex to match literal + encoded forms.
internal/proxy/phantom_strip.go Mirrors provider-listed phantom tokens with URL-encoded variants for stripping; expands prefix detection for regex fallback.
internal/proxy/addon.go Updates request/stream swapping logic to detect and replace both literal and URL-encoded phantom forms (body/headers/query/path).
internal/proxy/addon_test.go Adds focused tests for form-urlencoded bodies, URL-encoded query swapping, and URL-encoded unbound-phantom stripping.
Comments suppressed due to low confidence (1)

internal/proxy/addon.go:657

  • The PR adds URL-path scanning/swapping for URL-encoded phantom prefixes, but tests only cover form-urlencoded bodies and RawQuery. Adding a focused test for a phantom in the URL path (including a secret with spaces) would help catch incorrect escaping (e.g., '+' vs '%20') and ensure RawPath/Path handling remains stable.
	// Pass 2+3 on URL path.
	if rawP := f.Request.URL.Path; bytesContainsAnyPhantomPrefix([]byte(rawP)) {
		f.Request.URL.Path = string(
			a.swapPhantomBytes([]byte(rawP), pairs, host, port, "URL path"),
		)
		f.Request.URL.RawPath = ""
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/addon.go
Hot-path response to Copilot review on #40:

1. swapPhantomBytes now chooses between url.QueryEscape and url.PathEscape
   based on the location label. QueryEscape encodes a space as '+', which
   is correct for form bodies and URL query strings but corrupts URL path
   substitutions where a literal '+' is its own character. Paths get
   PathEscape; bodies, queries, and headers keep QueryEscape.

2. phantomPair gains an encodedPhantom byte slice, populated once at
   pair-construction time via encodePhantomForPair. The hot scans
   (swapPhantomBytes, swapPhantomHeaders, phantomSwapReader, maxPhantomLen)
   read the precomputed bytes instead of recomputing url.QueryEscape on
   every request, header, and stream chunk. The encoded secret is still
   computed on demand, but only when the encoded phantom is actually
   present in the data, so the no-match case is now allocation-free.

3. Added unit tests for the path/query escape divergence and the
   QueryEscape symmetry case so the location-specific behavior stays
   pinned.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Comment thread internal/proxy/phantom.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/phantom_strip.go Outdated
Comment thread internal/proxy/addon.go
Comment thread internal/proxy/addon.go Outdated
RFC 3986 §2.1 makes percent-encoded hex digits case-insensitive. Go's
url.QueryEscape always emits uppercase, but a third-party client can emit
lowercase, so a scan that only matched %3A would miss
SLUICE_PHANTOM%3a... and let the phantom leak.

Adds a urlEncodedPhantomPrefixLower constant and an encodedPhantomLower
field on phantomPair, populated once at pair-construction time. The
swapPhantomBytes, swapPhantomHeaders, streaming reader, prefix detector,
and unbound-strip path all check both casings. The replacement secret is
escaped once on first hit and reused across both casing branches so the
cost stays linear in number-of-encoded-forms, not pairs.

Adds tests for the lowercase form on both the bound-swap and unbound-
strip paths.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Comment thread internal/proxy/phantom.go Outdated
Comment thread internal/proxy/phantom_strip.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/addon.go
Comment thread internal/proxy/addon_test.go
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

Comment thread internal/proxy/phantom_strip.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/addon.go Outdated
…dary

Adds TestStreamRequestModifier_UrlEncodedPhantomSpanningReads, which
places the url-encoded phantom in the middle of a 32 KiB body so it
straddles a read chunk. Verifies the holdback buffer in phantomSwapReader
is sized for the encoded form (two bytes longer than the literal) so an
encoded phantom on a boundary cannot slip through unswapped.

Pins the maxPhantomLen change from e92001c (encoded-aware sizing) against
regression.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment thread internal/proxy/quic.go
Comment thread internal/proxy/ws.go
Comment thread internal/proxy/addon.go
Comment thread internal/proxy/phantom_strip.go
Plus byte-level escape helpers so the secret never crosses into an
immutable Go string.

QUIC and WebSocket each carry a per-pair phantom swap that previously
only matched the literal SLUICE_PHANTOM:<name> form. With the new
precomputed encodedPhantom and encodedPhantomLower fields available on
phantomPair, the QUIC replacePhantomInHeaders, replacePhantomInBody, and
the WebSocket relayFrames text-frame swap now also match
SLUICE_PHANTOM%3A<name> and SLUICE_PHANTOM%3a<name>, and trigger
unbound-phantom stripping when any encoded prefix is present. Without
this, an HTTP/3 OAuth refresh or a form-encoded WebSocket text payload
would have leaked the encoded phantom unchanged.

The new queryEscapeBytes and pathEscapeBytes helpers are byte-in,
byte-out URL encoders that mirror net/url.{Query,Path}Escape's output
rules. SwapPhantomBytes, swapPhantomHeaders, phantomSwapReader.Read,
the QUIC swap, and the WebSocket swap all use them instead of
`url.QueryEscape(string(secret.Bytes()))`, so the secret stays in byte
slices that the caller can zero via SecureBytes.Release(). The
unreserved character sets follow RFC 3986: query-component (alpha /
digit / - _ . ~) plus space->+, and path-segment (the same set plus
sub-delims $ & + , ; = : @).

phantom_strip.go now reuses encodePhantomForPair and
encodePhantomLowerForPair instead of duplicating the URL-encoding rule
inline, and drops the net/url import.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comment thread internal/proxy/phantom_pairs.go Outdated
Comment thread internal/proxy/addon.go Outdated
Comment thread internal/proxy/phantom_pairs.go
…ctor

encodePhantomForPair: a pre-scan via phantomNeedsQueryEscape returns nil
without any allocation when every byte in the phantom is already in the
unreserved-for-query-component set. Skips the byte->string copy that
url.QueryEscape would otherwise produce for the no-op case.

encodePhantomLowerForPair: a pre-scan returns nil without allocating
when no percent-escape in the encoded phantom contains an uppercase A-F
hex digit, so the "nothing to lowercase" case (notably OAuth JWT
phantoms with no upper-hex escapes) is allocation-free.

swapPhantomBytes: the path-vs-query escape choice is now driven by an
explicit pathContext bool parameter instead of comparing the
human-readable location label. Callers in Request and the two test
helpers pass pathContext directly, so the type system enforces the
escape choice and a typo in the location string can no longer silently
flip path encoding to query encoding (or vice versa).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread internal/proxy/phantom_pairs.go Outdated
The previous comment implied the no-allocation fast path covered OAuth
JWT phantoms and phantoms whose only escape is %3A. The %3A case still
differs from %3a, so the fast path does NOT skip the allocation there.
It only fires when every percent-escape uses 0-9 digits exclusively
(e.g. %20%21%30). Aligned the comment with the actual behavior.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants