Skip to content

Ext_proc filter#12792

Open
kannanjgithub wants to merge 378 commits intogrpc:masterfrom
kannanjgithub:ext-proc
Open

Ext_proc filter#12792
kannanjgithub wants to merge 378 commits intogrpc:masterfrom
kannanjgithub:ext-proc

Conversation

@kannanjgithub
Copy link
Copy Markdown
Contributor

@kannanjgithub kannanjgithub commented May 4, 2026

Implements ext_proc filter from A93 (internal design doc)
Metrics aren't implemented yet.
Includes commits from unmerged Filter API enhancements and channel caching PRs.
Only the ExternaProcessingFilter.java, ExternaProcessingFilterTest.java and the envoy xds proto import and generated code need to be reviewed.
Rebasing commit history caused all received and merged commits to show my name as the committer, ignore all commits for which I'm not shown as the author.

…PlaneRequestIsDrained to use in-process server
…estsAreForwardedImmediately to use in-process server
…roc response but only sent a mode override. As per envoy ext-processing ext-proc cannot omit request_headers (even if empty) in response to a request_headers processing request.
… track close already called for handling immediate response.
…ubHasCorrectDeadline to use real dataPlaneChannel
…xtProcStreamSendsMetadata to use real dataPlaneChannel
…sHeadersAndCallIsBuffered to use real dataPlaneChannel
…henMutationsAreAppliedAndCallIsActivated to use real dataPlaneChannel
…sActivatedImmediately to use real dataPlaneChannel and ServerInterceptor
…henMessagesAreDiscarded to use real dataPlaneChannel
…ExtProcAndSuperHalfCloseIsDeferred to use real dataPlaneChannel
…nSuperHalfCloseIsCalled to use real dataPlaneChannel
…stsAreForwardedImmediately to use real dataPlaneChannel
…sHeadersAndCallIsBuffered to use real dataPlaneChannel
…henMutationsAreAppliedAndCallIsActivated to use real dataPlaneChannel
…sActivatedImmediately to use real dataPlaneChannel
ejona86 and others added 17 commits May 4, 2026 07:54
Often there is no cause, but connect(), channel credentials, and call
credentials failures on the control plane RPC can include a useful
causal exception.

This was triggered by seeing an error like below, but it didn't include
the cause, which would have included HTTP error information from the
failure fetching the credential.
```
UNAVAILABLE: Error retrieving LDS resource xdstp://traffic-director-c2p.xds.googleapis.com/envoy.config.listener.v3.Listener/bigtable.googleapis.com: UNAUTHENTICATED: Failed computing credential metadata nodeID: C2P-798500073
```
Metadata was accidentally being retained after the start of the call.
That can be an overwhelming percentage of memory for an idle RPC; don't
do that. The other changes are considerably smaller, but I happened to
notice them and the changes are straight-forward without magic numbers
(e.g., there's many arrays that could be tuned).

The regular interop server uses 4600 bytes per full duplex stream while
idle, but much of that is Census recorded events hanging around. Keeping
the Census integration but removing the Census impl (so a noop is used)
drops that to 3000 bytes. This change brings that down to ~2450 bytes
(which is still including stuff from TestServiceImpl). But there's very
little Metadata in the interop tests, so absolute real-life savings
would be much higher (but relative real-life savings may be lower,
because the application will often have more state).

The measurements were captured using a modified
timeout_on_sleeping_server client that had 100,000 concurrent full
duplex calls on one connection.
This does reduce the largest supported integer from just less than 2^32
to slightly more than 2^29, which does not seem a significant loss.

It would previously produce a corrupted integer, which makes debugging
annoying. Note that continuations can contain just zeros and should
still be detected as resulting in overflow, without waiting for any
eventual 1.

We could leave the encoder supporting up to 2^32-1, but it just seems
wrong to encode values that the same implementation couldn't decode.

Noticed by @August829
Align DelayedClientCall.DelayedListener with ClientCallImpl's existing
behavior for listener exceptions. When the application listener throws
from onHeaders/onMessage/onReady, catch the Throwable, cancel the call
with CANCELLED (cause = the throwable), and swallow subsequent
callbacks. When onClose throws, log and continue, matching
ClientCallImpl.closeObserver. If onClose arrives from the transport
after a prior callback threw, override its status/trailers with the
captured CANCELLED so a server-supplied OK can't mask the local failure.

Previously, a throw from the application listener escaped to the
callExecutor's uncaught-exception handler. The real call was not
cancelled and the transport kept delivering callbacks to an already
broken listener, different from how the same bug behaves on a normal
ClientCallImpl, and a timing-dependent inconsistency depending on
whether callbacks arrived before or after setCall + drain completed.

Trade-off: listener-callback throws are no longer visible to the
executor's UncaughtExceptionHandler (they're attached as Status.cause
instead). This matches ClientCallImpl and is the intended behavior.

Exception handling for the outer drainPendingCalls loop
(realCall.sendMessage/request/halfClose/cancel) remains unaddressed;
that TODO is preserved.

**Note:**
This change only handles exceptions thrown by the application listener.
I don't try and solve the problems that grpc#12737 is attempting to fix. My
motivation is to fix the root cause behind
bazelbuild/bazel#29316

---------

Co-authored-by: Kannan J <kannanjgithub@google.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This was already being done for servlet, so it was just copied to
jakarta.
This commit introduces a library for handling header mutations as
specified by the xDS protocol. This library provides the core
functionality for modifying request and response headers based on a set
of rules.

The main components of this library are:
- `HeaderMutator`: Applies header mutations to `Metadata` objects.
- `HeaderMutationFilter`: Filters header mutations based on a set of
configurable rules, such as disallowing mutations of system headers.
- `HeaderMutations`: A value class that represents the set of mutations
to be applied to request and response headers.
- `HeaderMutationDisallowedException`: An exception that is thrown when
a disallowed header mutation is attempted.

This commit also includes comprehensive unit tests for the new library.
ExternalProcessorFilter.java
   - Trailers-Only Detection: Added state tracking to ExtProcListener to identify when a gRPC "trailers-only" response occurs (i.e., when onClose is called without a preceding onHeaders).
   - Protocol Compliance: Updated the state machine to send a RESPONSE_HEADERS message to the sidecar with the end_of_stream flag set for trailers-only responses. This satisfies the requirement that headers must be the first message in any response phase.
   - Handshake Handling: Modified onNext to correctly apply sidecar mutations to gRPC trailers and terminate the interaction when a trailers-only handshake is completed.
   - Robustness: Added null checks in header mutation logic to prevent NullPointerException during edge-case state transitions.

  ExternalProcessorFilterTest.java
   - Forward Compatibility: Updated the createBaseProto helper to default to SKIP mode for response headers and trailers. This ensures that the 60+ existing tests (which primarily focus on the request phase) continue to pass without being blocked by the new response handshake.
   - Streaming Robustness: Refactored Category 11 (Client/Bidi Streaming) mock sidecars to handle and acknowledge the full sequence of protocol phases (including the newly added response phases).
   - Category 15 (New): Added a dedicated test case givenTrailersOnly_whenResponseReceived_thenResponseHeadersSentWithEos which validates that:
       1. Trailers are correctly sent to the sidecar as headers when the data plane server returns an immediate error.
       2. The sidecar receives the end_of_stream signal.
       3. Mutated trailers from the sidecar are correctly applied to the final RPC state.
Implemented a FIFO queue in ExternalProcessorFilter to ensure that responses
from the external processor server arrive in the same order as the events
sent by the filter, as required by gRFC A93. Added unit tests to verify that
out-of-order responses correctly trigger a protocol error and fail the stream.
- Added rejection of CONTINUE_AND_REPLACE status in HeadersResponse for
  both request and response headers, treating it as a stream failure.
- Fixed potential hangs by ensuring proceedWithClose() is called upon
  stream failure, especially in fail-open scenarios.
- Added explicit sidecar notification via requestStream.onError() upon
  detecting protocol errors to ensure robust stream termination.
- Added new unit test categories 17 in ExternalProcessorFilterTest
  to verify status enforcement.
Updated ExternalProcessorFilter to include the `protocol_config` field in
 the very first `ProcessingRequest` sent to the sidecar (RequestHeaders).
The configuration includes the `request_body_mode` and `response_body_mode`
derived from the filter's processing mode, as required by gRFC A93.
Added a unit test in Category 4 to verify that `protocol_config` is correctly
populated on the first message and omitted from all subsequent messages on
the stream.

Add tests for trailers HeaderSendMode default to SKIP for DEFAULT.

nit: Renumbered out of order test categories.
…ment rather than a granular merge of its fields.
# Conflicts:
#	xds/src/test/java/io/grpc/xds/internal/headermutations/HeaderMutationsTest.java
@kannanjgithub kannanjgithub marked this pull request as draft May 4, 2026 11:39
# Conflicts:
#	xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
@kannanjgithub kannanjgithub marked this pull request as ready for review May 4, 2026 12:39
@kannanjgithub kannanjgithub requested a review from sauravzg May 4, 2026 12:39
Fix style and warning as errors.
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java Outdated
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
@sauravzg
Copy link
Copy Markdown
Collaborator

sauravzg commented May 6, 2026

Reviewed approximately ~500 LOC of source(not tests), which currently mostly only covers the config part of things. Sending comments incrementally to kick start progress, since it seems that I'll need probably about 3 days to review all of source.

MethodDescriptor<ReqT, RespT> method,
CallOptions callOptions,
Channel next) {
SerializingExecutor serializingExecutor = new SerializingExecutor(callOptions.getExecutor());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a potential null pointer crash here. getExecutor is nullable and SerializingExecutor expects null.

Side question: Do we have some static analysis that can catch these things?

cachedChannelManager.getChannel(filterConfig.grpcServiceConfig))
.withExecutor(serializingExecutor);

if (filterConfig.grpcServiceConfig.timeout() != null
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find an explicit style guide around it (please add if you find one) , but I believe java prefers data members to be private and be accessed via getters.

cachedChannelManager.getChannel(filterConfig.grpcServiceConfig))
.withExecutor(serializingExecutor);

if (filterConfig.grpcServiceConfig.timeout() != null
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC GrpcServiceConfig is an autovalue which has non nullable as precondition unless specified otherwise. So, this check might not be necessary.
Additionally, nullable optionals should be avoided go/java-practices/optional#nullable


if (filterConfig.grpcServiceConfig.timeout() != null
&& filterConfig.grpcServiceConfig.timeout().isPresent()) {
long timeoutNanos = filterConfig.grpcServiceConfig.timeout().get().toNanos();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be worth moving to a variable

}

ImmutableList<HeaderValue> initialMetadata = filterConfig.grpcServiceConfig.initialMetadata();
extProcStub = extProcStub.withInterceptors(new ClientInterceptor() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, maybe we can use MetadataUtils.newAttachHeadersInterceptor(parsedMetadata).

This may have a few advantages:

  • Reduce code bloat
  • Reduce computation in datapath. We create a new Metadata object whenever we create an interceptor moving the expensive operation to control path.

io.envoyproxy.envoy.config.core.v3.HeaderValue headerValue =
io.envoyproxy.envoy.config.core.v3.HeaderValue.newBuilder()
.setKey(key.toLowerCase(Locale.ROOT))
.setValue(encoded)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely recall that we were supposed to use rawValue for binary data and value for ascii.
I recall that we still had some confusion around that . I don't know what the latest here is?

Metadata.Key<byte[]> key;
try {
key = Metadata.Key.of(headerName, Metadata.BINARY_BYTE_MARSHALLER);
} catch (IllegalArgumentException e) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be catching this ?

  • Most usages in codebase currently don't catch this.
  • If we need it for validation to catch and ignore valid ones, we might be better off validating headers ourselves according to spec before calling this. There may still be a gap where our validation is a subset of Metadata key validation and we may crash on bad input, which may be acceptable and we need to update either our validation utilitity or the ext proc header validation spec.

}
java.util.List<String> encoded = new ArrayList<>();
for (byte[] v : values) {
encoded.add(BaseEncoding.base64().omitPadding().encode(v));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of inconcistency here, we use omitPadding here but don't use it in ToHeaderdmap

for (byte[] v : values) {
encoded.add(BaseEncoding.base64().omitPadding().encode(v));
}
return com.google.common.base.Joiner.on(",").join(encoded);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe worth adding an import for this and for the java util List

@sauravzg
Copy link
Copy Markdown
Collaborator

sauravzg commented May 7, 2026

Reviewed about another 300 LOC, which covers static utilities and the interceptors. Speed was slower than expected, but will try to catch up.


private final ExternalProcessor externalProcessor;
private final GrpcServiceConfig grpcServiceConfig;
private final Optional<HeaderMutationRulesConfig> mutationRulesConfig;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should initialise it to empty by default optional nullables are usually considered a bad practice.

Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Comment thread xds/src/main/java/io/grpc/xds/ExternalProcessorFilter.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.