Skip to content

Handle remote peer exceptions#2520

Closed
ERogalla wants to merge 5 commits intodevelopfrom
erogalla/remote-peer
Closed

Handle remote peer exceptions#2520
ERogalla wants to merge 5 commits intodevelopfrom
erogalla/remote-peer

Conversation

@ERogalla
Copy link

Before this PR

We have experienced many IO exceptions originating from this undertow message.

After this PR

Don't rethrow on this error.

Comment on lines +84 to +85
&& (!exchange.getConnection().isOpen()
|| throwable.equals(UndertowMessages.MESSAGES.couldNotReadContentLengthData()))) {
Copy link
Member

@pkoenig10 pkoenig10 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should probably be pulled out to a helper method so we can give it a descriptive name and avoid cluttering this existing instanceof casing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works, I'd suggest taking another pass at the test to see what's happening in practice!

Creating a new IOException + gathering its stack trace every time we catch any sort of IOException isn't ideal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. @carterkozak you're right. The original test I added did not trigger the added codepath.

.equals(UndertowMessages.MESSAGES
.couldNotReadContentLengthData()
.getMessage())) {
log.info(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to not send an error here? Or should it be a 400 error.
For reference, this error message is pulled directly from UndertowMessages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could not read the full request from the client and client has closed its side of the connection so we can't send a response, but we need to close and clean up our exchange.

Copy link
Contributor

@schlosna schlosna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest #2523 on top of this PR

Comment on lines +83 to +84
} else if (throwable instanceof IOException) {
ioException(exchange, (IOException) throwable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use instanceof pattern variable to avoid cast:

Suggested change
} else if (throwable instanceof IOException) {
ioException(exchange, (IOException) throwable);
} else if (throwable instanceof IOException ioException) {
ioException(exchange, ioException);

log.info(
"Remote peer closed connection before all data could be read. The request may have been aborted by"
+ " the client",
ioException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ioException);
ioException);
IoUtils.safeClose(exchange.getConnection());

Comment on lines +209 to +213
if (ioException
.getMessage()
.equals(UndertowMessages.MESSAGES
.couldNotReadContentLengthData()
.getMessage())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ioException.getMessage() could return null leading to a NullPointerException on this error handling path, so we need to handle that.

Additionally, UndertowMessages.MESSAGES.couldNotReadContentLengthData() creates a new IOException on each invocation and adds the expensive overhead of Throwable#fillInStackTrace(). We should resolve the message once to a static final field.

I've pushed #2523 as suggested changes on top of this branch

Suggested change
if (ioException
.getMessage()
.equals(UndertowMessages.MESSAGES
.couldNotReadContentLengthData()
.getMessage())) {
if (Objects.equals(ioException.getMessage(), COULD_NOT_READ_CONTENT_LENGTH_MESSAGE)) {

.equals(UndertowMessages.MESSAGES
.couldNotReadContentLengthData()
.getMessage())) {
log.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could not read the full request from the client and client has closed its side of the connection so we can't send a response, but we need to close and clean up our exchange.

public void handlesPeerRemoteExceptionsWithoutRethrowing() throws IOException {
exception = UndertowMessages.MESSAGES.couldNotReadContentLengthData();
HttpURLConnection connection = execute();
assertThat(connection.getResponseCode()).isEqualTo(200);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't be a response code if the request wasn't fully read and we threw before processing & writing response

Suggested change
assertThat(connection.getResponseCode()).isEqualTo(200);
assertThatThrownBy(() -> connection.getResponseCode())
.isInstanceOf(SocketException.class)
.hasMessageContaining("Unexpected end of file from server");

exception, exchange, UndertowDeadlineReasonResponseEncodingAdapter.INSTANCE);
}

private static void ioException(HttpServerExchange exchange, IOException ioException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move this below error to match the ordering of the cases above.

@carterkozak
Copy link
Contributor

Can you help me understand the core problem that we’re solving? The existing exception itself isn’t unreasonable imo, but it could certainly be handled better. Is that signal causing monitors to fire?

I’m a bit worried that we’re matching a specific failure where we should probably look for a class of failures, as there are other ways that requests can bail out depending on the transport encoding.

@ERogalla
Copy link
Author

ERogalla commented May 5, 2025

This was not causing any monitors to fire, but was just spamming error service logs. I was hoping to clear out any noisy/incorrect error logs that aren't actual errors. We can instruct our team to ignore these errors, but it is occurring on seemingly random service endpoints.

FWIW I'm totally open to close this PR out if this isn't the root cause. I just assumed that this was just some unhandled (but expected) exception.

@carterkozak
Copy link
Contributor

In that case, I'd attempt to cast a more precise net: Instead of updating our exception handling to check messages, we may prefer to wrap the request body conduit/channel such that we can intercept I/O exceptions while reading data and handle them more precisely without impacting the flow of exceptions thrown within service implementations.

* Close exchange on closed client

* Apply suggestions from code review

Co-authored-by: Patrick Koenig <pkoenig10@gmail.com>

---------

Co-authored-by: Patrick Koenig <pkoenig10@gmail.com>
@stale
Copy link

stale bot commented Jun 27, 2025

This PR has been automatically marked as stale because it has not been touched in the last 14 days. If you'd like to keep it open, please leave a comment or add the 'long-lived' label, otherwise it'll be closed in 7 days.

@stale stale bot added the stale label Jun 27, 2025
@stale stale bot closed this Oct 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants