Ext_proc filter#12792
Conversation
…eBuffered to use in-process server
…reNOTBuffered to use in-process server
…AreBuffered to use in-process server
…PlaneRequestIsDrained to use in-process server
…estsAreForwardedImmediately to use in-process server
…roc response but only sent a mode override. As per envoy ext-processing ext-proc cannot omit request_headers (even if empty) in response to a request_headers processing request.
… track close already called for handling immediate response.
…lizingExecutor to use real dataPlaneChannel
…ubHasCorrectDeadline to use real dataPlaneChannel
…xtProcStreamSendsMetadata to use real dataPlaneChannel
…sHeadersAndCallIsBuffered to use real dataPlaneChannel
…henMutationsAreAppliedAndCallIsActivated to use real dataPlaneChannel
…sActivatedImmediately to use real dataPlaneChannel and ServerInterceptor
…entToExtProc to use real dataPlaneChannel
…henMessagesAreDiscarded to use real dataPlaneChannel
…ExtProcAndSuperHalfCloseIsDeferred to use real dataPlaneChannel
…nSuperHalfCloseIsCalled to use real dataPlaneChannel
…False to use real dataPlaneChannel
…sTrue to use real dataPlaneChannel
…OnReady to use real dataPlaneChannel
…Ready to use real dataPlaneChannel
…stsAreForwardedImmediately to use real dataPlaneChannel
…sHeadersAndCallIsBuffered to use real dataPlaneChannel
…henMutationsAreAppliedAndCallIsActivated to use real dataPlaneChannel
…sActivatedImmediately to use real dataPlaneChannel
…entToExtProc to use real dataPlaneChannel
Often there is no cause, but connect(), channel credentials, and call credentials failures on the control plane RPC can include a useful causal exception. This was triggered by seeing an error like below, but it didn't include the cause, which would have included HTTP error information from the failure fetching the credential. ``` UNAVAILABLE: Error retrieving LDS resource xdstp://traffic-director-c2p.xds.googleapis.com/envoy.config.listener.v3.Listener/bigtable.googleapis.com: UNAUTHENTICATED: Failed computing credential metadata nodeID: C2P-798500073 ```
Metadata was accidentally being retained after the start of the call. That can be an overwhelming percentage of memory for an idle RPC; don't do that. The other changes are considerably smaller, but I happened to notice them and the changes are straight-forward without magic numbers (e.g., there's many arrays that could be tuned). The regular interop server uses 4600 bytes per full duplex stream while idle, but much of that is Census recorded events hanging around. Keeping the Census integration but removing the Census impl (so a noop is used) drops that to 3000 bytes. This change brings that down to ~2450 bytes (which is still including stuff from TestServiceImpl). But there's very little Metadata in the interop tests, so absolute real-life savings would be much higher (but relative real-life savings may be lower, because the application will often have more state). The measurements were captured using a modified timeout_on_sleeping_server client that had 100,000 concurrent full duplex calls on one connection.
This does reduce the largest supported integer from just less than 2^32 to slightly more than 2^29, which does not seem a significant loss. It would previously produce a corrupted integer, which makes debugging annoying. Note that continuations can contain just zeros and should still be detected as resulting in overflow, without waiting for any eventual 1. We could leave the encoder supporting up to 2^32-1, but it just seems wrong to encode values that the same implementation couldn't decode. Noticed by @August829
Align DelayedClientCall.DelayedListener with ClientCallImpl's existing behavior for listener exceptions. When the application listener throws from onHeaders/onMessage/onReady, catch the Throwable, cancel the call with CANCELLED (cause = the throwable), and swallow subsequent callbacks. When onClose throws, log and continue, matching ClientCallImpl.closeObserver. If onClose arrives from the transport after a prior callback threw, override its status/trailers with the captured CANCELLED so a server-supplied OK can't mask the local failure. Previously, a throw from the application listener escaped to the callExecutor's uncaught-exception handler. The real call was not cancelled and the transport kept delivering callbacks to an already broken listener, different from how the same bug behaves on a normal ClientCallImpl, and a timing-dependent inconsistency depending on whether callbacks arrived before or after setCall + drain completed. Trade-off: listener-callback throws are no longer visible to the executor's UncaughtExceptionHandler (they're attached as Status.cause instead). This matches ClientCallImpl and is the intended behavior. Exception handling for the outer drainPendingCalls loop (realCall.sendMessage/request/halfClose/cancel) remains unaddressed; that TODO is preserved. **Note:** This change only handles exceptions thrown by the application listener. I don't try and solve the problems that grpc#12737 is attempting to fix. My motivation is to fix the root cause behind bazelbuild/bazel#29316 --------- Co-authored-by: Kannan J <kannanjgithub@google.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This was already being done for servlet, so it was just copied to jakarta.
This commit introduces a library for handling header mutations as specified by the xDS protocol. This library provides the core functionality for modifying request and response headers based on a set of rules. The main components of this library are: - `HeaderMutator`: Applies header mutations to `Metadata` objects. - `HeaderMutationFilter`: Filters header mutations based on a set of configurable rules, such as disallowing mutations of system headers. - `HeaderMutations`: A value class that represents the set of mutations to be applied to request and response headers. - `HeaderMutationDisallowedException`: An exception that is thrown when a disallowed header mutation is attempted. This commit also includes comprehensive unit tests for the new library.
ExternalProcessorFilter.java
- Trailers-Only Detection: Added state tracking to ExtProcListener to identify when a gRPC "trailers-only" response occurs (i.e., when onClose is called without a preceding onHeaders).
- Protocol Compliance: Updated the state machine to send a RESPONSE_HEADERS message to the sidecar with the end_of_stream flag set for trailers-only responses. This satisfies the requirement that headers must be the first message in any response phase.
- Handshake Handling: Modified onNext to correctly apply sidecar mutations to gRPC trailers and terminate the interaction when a trailers-only handshake is completed.
- Robustness: Added null checks in header mutation logic to prevent NullPointerException during edge-case state transitions.
ExternalProcessorFilterTest.java
- Forward Compatibility: Updated the createBaseProto helper to default to SKIP mode for response headers and trailers. This ensures that the 60+ existing tests (which primarily focus on the request phase) continue to pass without being blocked by the new response handshake.
- Streaming Robustness: Refactored Category 11 (Client/Bidi Streaming) mock sidecars to handle and acknowledge the full sequence of protocol phases (including the newly added response phases).
- Category 15 (New): Added a dedicated test case givenTrailersOnly_whenResponseReceived_thenResponseHeadersSentWithEos which validates that:
1. Trailers are correctly sent to the sidecar as headers when the data plane server returns an immediate error.
2. The sidecar receives the end_of_stream signal.
3. Mutated trailers from the sidecar are correctly applied to the final RPC state.
Implemented a FIFO queue in ExternalProcessorFilter to ensure that responses from the external processor server arrive in the same order as the events sent by the filter, as required by gRFC A93. Added unit tests to verify that out-of-order responses correctly trigger a protocol error and fail the stream.
- Added rejection of CONTINUE_AND_REPLACE status in HeadersResponse for both request and response headers, treating it as a stream failure. - Fixed potential hangs by ensuring proceedWithClose() is called upon stream failure, especially in fail-open scenarios. - Added explicit sidecar notification via requestStream.onError() upon detecting protocol errors to ensure robust stream termination. - Added new unit test categories 17 in ExternalProcessorFilterTest to verify status enforcement.
Updated ExternalProcessorFilter to include the `protocol_config` field in the very first `ProcessingRequest` sent to the sidecar (RequestHeaders). The configuration includes the `request_body_mode` and `response_body_mode` derived from the filter's processing mode, as required by gRFC A93. Added a unit test in Category 4 to verify that `protocol_config` is correctly populated on the first message and omitted from all subsequent messages on the stream. Add tests for trailers HeaderSendMode default to SKIP for DEFAULT. nit: Renumbered out of order test categories.
…ment rather than a granular merge of its fields.
# Conflicts: # xds/src/test/java/io/grpc/xds/internal/headermutations/HeaderMutationsTest.java
# Conflicts: # xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Fix style and warning as errors.
|
Reviewed approximately ~500 LOC of source(not tests), which currently mostly only covers the config part of things. Sending comments incrementally to kick start progress, since it seems that I'll need probably about 3 days to review all of source. |
| MethodDescriptor<ReqT, RespT> method, | ||
| CallOptions callOptions, | ||
| Channel next) { | ||
| SerializingExecutor serializingExecutor = new SerializingExecutor(callOptions.getExecutor()); |
There was a problem hiding this comment.
We have a potential null pointer crash here. getExecutor is nullable and SerializingExecutor expects null.
Side question: Do we have some static analysis that can catch these things?
| cachedChannelManager.getChannel(filterConfig.grpcServiceConfig)) | ||
| .withExecutor(serializingExecutor); | ||
|
|
||
| if (filterConfig.grpcServiceConfig.timeout() != null |
There was a problem hiding this comment.
I couldn't find an explicit style guide around it (please add if you find one) , but I believe java prefers data members to be private and be accessed via getters.
| cachedChannelManager.getChannel(filterConfig.grpcServiceConfig)) | ||
| .withExecutor(serializingExecutor); | ||
|
|
||
| if (filterConfig.grpcServiceConfig.timeout() != null |
There was a problem hiding this comment.
IIRC GrpcServiceConfig is an autovalue which has non nullable as precondition unless specified otherwise. So, this check might not be necessary.
Additionally, nullable optionals should be avoided go/java-practices/optional#nullable
|
|
||
| if (filterConfig.grpcServiceConfig.timeout() != null | ||
| && filterConfig.grpcServiceConfig.timeout().isPresent()) { | ||
| long timeoutNanos = filterConfig.grpcServiceConfig.timeout().get().toNanos(); |
There was a problem hiding this comment.
nit: Might be worth moving to a variable
| } | ||
|
|
||
| ImmutableList<HeaderValue> initialMetadata = filterConfig.grpcServiceConfig.initialMetadata(); | ||
| extProcStub = extProcStub.withInterceptors(new ClientInterceptor() { |
There was a problem hiding this comment.
So, maybe we can use MetadataUtils.newAttachHeadersInterceptor(parsedMetadata).
This may have a few advantages:
- Reduce code bloat
- Reduce computation in datapath. We create a new Metadata object whenever we create an interceptor moving the expensive operation to control path.
| io.envoyproxy.envoy.config.core.v3.HeaderValue headerValue = | ||
| io.envoyproxy.envoy.config.core.v3.HeaderValue.newBuilder() | ||
| .setKey(key.toLowerCase(Locale.ROOT)) | ||
| .setValue(encoded) |
There was a problem hiding this comment.
I vaguely recall that we were supposed to use rawValue for binary data and value for ascii.
I recall that we still had some confusion around that . I don't know what the latest here is?
| Metadata.Key<byte[]> key; | ||
| try { | ||
| key = Metadata.Key.of(headerName, Metadata.BINARY_BYTE_MARSHALLER); | ||
| } catch (IllegalArgumentException e) { |
There was a problem hiding this comment.
Should we be catching this ?
- Most usages in codebase currently don't catch this.
- If we need it for validation to catch and ignore valid ones, we might be better off validating headers ourselves according to spec before calling this. There may still be a gap where our validation is a subset of Metadata key validation and we may crash on bad input, which may be acceptable and we need to update either our validation utilitity or the ext proc header validation spec.
| } | ||
| java.util.List<String> encoded = new ArrayList<>(); | ||
| for (byte[] v : values) { | ||
| encoded.add(BaseEncoding.base64().omitPadding().encode(v)); |
There was a problem hiding this comment.
There's a bit of inconcistency here, we use omitPadding here but don't use it in ToHeaderdmap
| for (byte[] v : values) { | ||
| encoded.add(BaseEncoding.base64().omitPadding().encode(v)); | ||
| } | ||
| return com.google.common.base.Joiner.on(",").join(encoded); |
There was a problem hiding this comment.
nit: Maybe worth adding an import for this and for the java util List
|
Reviewed about another 300 LOC, which covers static utilities and the interceptors. Speed was slower than expected, but will try to catch up. |
|
|
||
| private final ExternalProcessor externalProcessor; | ||
| private final GrpcServiceConfig grpcServiceConfig; | ||
| private final Optional<HeaderMutationRulesConfig> mutationRulesConfig; |
There was a problem hiding this comment.
We should initialise it to empty by default optional nullables are usually considered a bad practice.
Implements ext_proc filter from A93 (internal design doc)
Metrics aren't implemented yet.
Includes commits from unmerged Filter API enhancements and channel caching PRs.
Only the ExternaProcessingFilter.java, ExternaProcessingFilterTest.java and the envoy xds proto import and generated code need to be reviewed.
Rebasing commit history caused all received and merged commits to show my name as the committer, ignore all commits for which I'm not shown as the author.