Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: JS clients must not release tickets for pending operations #6591

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

niloc132
Copy link
Member

When a necessary operation is still running, clients must not release upstream tickets. This has mostly worked by the server running fast enough or by the websocket transport serializing operations in a way that we can't always rely on.

Testing this is difficult to reliably do - the ticket has some manual steps that generally make it possible to see the issue, and we're exploring other options to make bugs like this more obvious. At this time, there is no good way to put an integration test in that verifies correct behavior efficiently.

Fixes #6056
Fixes DH-18486

When a necessary operation is still running, clients must not release
upstream tickets. This has mostly worked by the server running fast
enough or by the websocket transport serializing operations in a way
that we can't always rely on.

Testing this is difficult to reliably do - the ticket has some manual
steps that generally make it possible to see the issue, and we're
exploring other options to make bugs like this more obvious. At this
time, there is no good way to put an integration test in that verifies
correct behavior efficiently.

Fixes deephaven#6056
Fixes DH-18486
@devinrsmith
Copy link
Member

Potentially related to #5003, #4781

@niloc132
Copy link
Member Author

Potentially related to #5003, #4781

I don't think those are (directly) related here - though some future java-client changes could possibly find inspiration here. The JS API internally noticed that nothing was retaining those tickets so released them ASAP, this change ensures they live long enough to do their job - but releasing isn't used for that purpose in the JS API. Any client still is responsible for holding tickets where downstream operations are raced with the release - the client can not guarantee that the server has actually received the other operation yet, so must not release. This fix is purely a "the client was optimistic, and wrong".

@niloc132
Copy link
Member Author

niloc132 commented Jan 24, 2025

Here's a simple patch I added when testing this (and to test a few other operations to be sure they didn't have this issue):

diff --git a/server/jetty/src/main/java/io/deephaven/server/jetty/GrpcFilter.java b/server/jetty/src/main/java/io/deephaven/server/jetty/GrpcFilter.java
index 04e499b117..8f4bfdd1c0 100644
--- a/server/jetty/src/main/java/io/deephaven/server/jetty/GrpcFilter.java
+++ b/server/jetty/src/main/java/io/deephaven/server/jetty/GrpcFilter.java
@@ -42,6 +42,13 @@ public class GrpcFilter extends HttpFilter {
             // we now know that this request is meant to be grpc, ensure that the underlying http version is not
             // 1.1 so that grpc will behave
             if (!REQUIRE_HTTP2 || request.getProtocol().equals("HTTP/2.0")) {
+                if (!request.getPathInfo().contains("/Release")) {
+                    try {
+                        Thread.sleep(1000);
+                    } catch (InterruptedException e) {
+                        throw new RuntimeException(e);
+                    }
+                }
                 grpcAdapter.doPost(request, response);
             } else {
                 // A "clean" implementation of this would use @Internal-annotated types from grpc, which is discouraged.

Extremely heavy handed, but it ensures beyond a doubt that the server handles raced releases long before it handles other operations.

@niloc132 niloc132 modified the milestones: 0.37.5, 0.38.0 Jan 24, 2025
@mofojed
Copy link
Member

mofojed commented Jan 27, 2025

This doesn't seem to address the errors from the ticket: #6056
Notably still getting the null error for ticket: 16:46:36.654 dh-core.js:29526 Uncaught (in promise) TypeError: Cannot read properties of null (reading 'ticket_0')

Copy link
Member

@mofojed mofojed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seeing the null errors in the logs: #6056 (comment)

@niloc132 niloc132 requested a review from mofojed January 27, 2025 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rollup prints DEPENDENCY_RELEASED error to console when groupings are added
4 participants