Skip to content

Fix async redirect regression and side-output data loss/leaks#1320

Draft
cursor[bot] wants to merge 4 commits into
mainfrom
cursor/critical-bug-fixes-bb05
Draft

Fix async redirect regression and side-output data loss/leaks#1320
cursor[bot] wants to merge 4 commits into
mainfrom
cursor/critical-bug-fixes-bb05

Conversation

@cursor

@cursor cursor Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Critical bug-finding automation identified three high-severity correctness issues in recent commits on rc-v0.6.7. This PR applies minimal fixes with unit test coverage.

1. Async redirect handling regression (#1278)

Bug & impact: PR #1278 changed the default FOLLOW_REDIRECTS behavior from !async to true. Async connectors such as Slack Analytics (which has enable_async_processing: true but no FOLLOW_REDIRECTS override) would auto-follow 3xx redirects instead of letting the proxy intercept the Location header. This breaks async file-download flows and causes silent data loss (empty or wrong content written to async output).

Root cause: ApiDataRequestHandler used .orElse(true) instead of the prior .orElse(!processingContext.getAsync()).

Fix: Restore async-aware default; add FOLLOW_REDIRECTS=FALSE to Slack Analytics connector spec; add asyncModeDisablesRedirectFollowingByDefault unit test.

2. REQUEST_BODY and request-capture metadata leaks

Bug & impact: Sanitized API responses echoed all ProcessedContent metadata as HTTP headers, including base64-encoded REQUEST_BODY (the outbound POST body sent to the source API). The same capture metadata was also copied into sanitized GCS/S3 side-output objects (worsened on GCP by #1299). This is a PII/compliance leak on every synchronous POST and on sanitized bucket objects.

Root cause: sanitize() copied raw side-output capture metadata wholesale; response building exposed the full metadata map to callers.

Fix: Only expose ProcessedDataMetadataFields in HTTP response headers; strip OutputObjectMetadata keys from sanitized content metadata (retained on raw side output).

3. Null-body side-output crash (data loss)

Bug & impact: Source API responses with no body (204, 304, HEAD) produce ProcessedContent with content == null (explicitly tested). GCSOutput, S3Output, and CompressedOutputWrapper dereferenced null bytes, threw WriteFailure, and the handler silently dropped the side-output object.

Root cause: Output writers assumed non-null content.getContent().

Fix: Treat null content as zero-length byte array in all three writers; add CompressedOutputWrapperTest.writeNullContent.

Validation

  • mvn test -pl core -am -Dtest=ApiDataRequestHandlerTest,CompressedOutputWrapperTest — 30 tests, 0 failures
  • Existing handleShouldFollowRedirectManuallyInAsyncMode test still passes

Base branch

Targets rc-v0.6.7.

Open in Web View Automation 

eschultink and others added 4 commits June 18, 2026 13:33
* drop lookup buckets from CallerAccess policy

* style fixes
204/HEAD responses produce ProcessedContent with null body bytes. Writing
those to side output previously threw NPE inside GCSOutput, S3Output, or
CompressedOutputWrapper, causing silent side-output data loss.

Co-authored-by: Erik Schultink <eschultink@users.noreply.github.com>
Restore async-default redirect behavior broken by #1278: when FOLLOW_REDIRECTS
is unset, disable automatic redirect following in async mode so connectors
like Slack Analytics can intercept 3xx Location responses.

Stop leaking request-capture metadata (REQUEST_BODY, API_HOST, etc.) in
sanitized HTTP response headers and sanitized side-output object metadata.
Only ProcessedDataMetadataFields are exposed to callers; object-storage
capture metadata is retained on raw side output only.

Set FOLLOW_REDIRECTS=FALSE on the Slack Analytics connector spec.

Co-authored-by: Erik Schultink <eschultink@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants