Fix async redirect regression and side-output data loss/leaks#1320
Draft
cursor[bot] wants to merge 4 commits into
Draft
Fix async redirect regression and side-output data loss/leaks#1320cursor[bot] wants to merge 4 commits into
cursor[bot] wants to merge 4 commits into
Conversation
* drop lookup buckets from CallerAccess policy * style fixes
204/HEAD responses produce ProcessedContent with null body bytes. Writing those to side output previously threw NPE inside GCSOutput, S3Output, or CompressedOutputWrapper, causing silent side-output data loss. Co-authored-by: Erik Schultink <eschultink@users.noreply.github.com>
Restore async-default redirect behavior broken by #1278: when FOLLOW_REDIRECTS is unset, disable automatic redirect following in async mode so connectors like Slack Analytics can intercept 3xx Location responses. Stop leaking request-capture metadata (REQUEST_BODY, API_HOST, etc.) in sanitized HTTP response headers and sanitized side-output object metadata. Only ProcessedDataMetadataFields are exposed to callers; object-storage capture metadata is retained on raw side output only. Set FOLLOW_REDIRECTS=FALSE on the Slack Analytics connector spec. Co-authored-by: Erik Schultink <eschultink@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Critical bug-finding automation identified three high-severity correctness issues in recent commits on
rc-v0.6.7. This PR applies minimal fixes with unit test coverage.1. Async redirect handling regression (#1278)
Bug & impact: PR #1278 changed the default
FOLLOW_REDIRECTSbehavior from!asynctotrue. Async connectors such as Slack Analytics (which hasenable_async_processing: truebut noFOLLOW_REDIRECTSoverride) would auto-follow 3xx redirects instead of letting the proxy intercept theLocationheader. This breaks async file-download flows and causes silent data loss (empty or wrong content written to async output).Root cause:
ApiDataRequestHandlerused.orElse(true)instead of the prior.orElse(!processingContext.getAsync()).Fix: Restore async-aware default; add
FOLLOW_REDIRECTS=FALSEto Slack Analytics connector spec; addasyncModeDisablesRedirectFollowingByDefaultunit test.2. REQUEST_BODY and request-capture metadata leaks
Bug & impact: Sanitized API responses echoed all
ProcessedContentmetadata as HTTP headers, including base64-encodedREQUEST_BODY(the outbound POST body sent to the source API). The same capture metadata was also copied into sanitized GCS/S3 side-output objects (worsened on GCP by #1299). This is a PII/compliance leak on every synchronous POST and on sanitized bucket objects.Root cause:
sanitize()copied raw side-output capture metadata wholesale; response building exposed the full metadata map to callers.Fix: Only expose
ProcessedDataMetadataFieldsin HTTP response headers; stripOutputObjectMetadatakeys from sanitized content metadata (retained on raw side output).3. Null-body side-output crash (data loss)
Bug & impact: Source API responses with no body (204, 304, HEAD) produce
ProcessedContentwithcontent == null(explicitly tested).GCSOutput,S3Output, andCompressedOutputWrapperdereferenced null bytes, threwWriteFailure, and the handler silently dropped the side-output object.Root cause: Output writers assumed non-null
content.getContent().Fix: Treat null content as zero-length byte array in all three writers; add
CompressedOutputWrapperTest.writeNullContent.Validation
mvn test -pl core -am -Dtest=ApiDataRequestHandlerTest,CompressedOutputWrapperTest— 30 tests, 0 failureshandleShouldFollowRedirectManuallyInAsyncModetest still passesBase branch
Targets
rc-v0.6.7.