fix: cherry-pick PR #334 ws lifecycle bridge#1
Conversation
`SubscriptionManager::start_reconnection_handler` (both `clob::ws::subscription` and `rtds::subscription`) spawns a detached `tokio` task that holds a strong `Arc<SubscriptionManager>` clone. The task also owns a cloned `ConnectionManager`, whose `watch::Sender` is what the task awaits on for state changes. Because the `Sender` only closes when every clone of it drops, and the strong `Arc` clone inside the spawned task prevents the owning manager (and therefore that `ConnectionManager` clone) from ever dropping, the task can never exit on its own — a reference cycle that leaked the entire channel (task, WebSocket, subscription manager) for the lifetime of the process every time a user dropped a `WsClient`. `start_reconnection_handler` now returns its `JoinHandle<()>`. A new `ws::task::AbortOnDrop` wrapper (thin `JoinHandle` + `Drop` that calls `abort()`) is stored next to the `Arc<SubscriptionManager>` inside `clob::ws::ChannelResources` and `rtds::ClientInner`. When the owning client drops, the wrapper drops, the task is aborted, its stack locals (including the strong `Arc` clone) are released, and the whole graph can drop normally. Public API is unchanged. A regression test in `clob::ws::subscription` constructs a `SubscriptionManager`, drops it together with the `AbortOnDrop`, and asserts the `Weak` downgrade cannot upgrade — i.e. the strong count reached zero. Unit tests for `AbortOnDrop` cover the cancel-pending and already-finished paths. Closes Polymarket#325
Codecov flagged the reconnect-handle plumbing on the RTDS side as uncovered — `Client::new`, `Client::authenticate`, `Client::deauthenticate`, and `SubscriptionManager::start_reconnection_handler` each had zero patch coverage, so the regression fence for issue Polymarket#325 only caught the clob::ws path. Add focused regression tests without widening the public surface: * `rtds::subscription::reconnect_handler_tests` mirrors the clob::ws test: constructs an `Arc<SubscriptionManager>`, calls `start_reconnection_handler`, wraps in `AbortOnDrop`, drops, and asserts the `Weak` downgrade cannot upgrade. Proves the same Arc cycle is broken on the RTDS code path. * `rtds::client::teardown_tests` covers the three `reconnect_handle` field sites in `Client::{new, authenticate, deauthenticate}` via `tokio::test`s that poll a `Weak<SubscriptionManager>` after drop. `authenticate` and `deauthenticate` tests also assert the manager is _still_ reachable mid-round-trip, ensuring the `AbortOnDrop` is forwarded into the new `ClientInner` rather than dropped early. * A `#[cfg(test)]`-only `inner_subscriptions_for_test` accessor on `Client<S>` provides the weak-ref probe without exposing the private `ClientInner.subscriptions` field. Gated behind `#[expect(clippy::multiple_inherent_impl, ...)]` with a reason, per crate style. These tests exercise the same `AbortOnDrop::new` + `Drop` paths the existing `ws::task::tests` already cover, transitively raising `ws/task.rs` coverage as well.
…olymarket#334) cargo fmt --all produced no working-tree changes under the current stable toolchain. This commit records Phase 1.3 and triggers fork CI on a MonkeyMoney-controlled branch without modifying the cherry-picked upstream commits.
|
Reopening to trigger fork CI after Actions were enabled for this new fork. |
Move the test-only accessor before the teardown test module and remove the now-unfulfilled lint expectation. This is a CI hygiene patch only; it does not change runtime behavior or extend the V1 fork beyond the P0-A1 bridge scope.
Per revised plan §SDK Function Signatures: add Client::close(timeout) half-closed socket handling plus unsubscribe_last_trade_price and unsubscribe_best_bid_ask aliases. Uses a narrow internal close control message because the connection task owns the WebSocket stream; no Issue Polymarket#327 reconnect behavior or long-term V1 hardening is included.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9a7667a2dd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| match message { | ||
| Ok(Message::Close(_)) => return Ok(()), | ||
| Ok(_) => {} | ||
| Err(e) => return Err(Error::from(e)), |
There was a problem hiding this comment.
Treat normal close-handshake completion as success
close_stream converts every read-side tungstenite error into a hard failure, but a graceful WebSocket shutdown can surface as ConnectionClosed once both close frames are exchanged. In that common case, Client::close can return an error even though the peer completed the close handshake correctly, which makes shutdown reporting unreliable for healthy connections that close promptly.
Useful? React with 👍 / 👎.
| let channels: Vec<( | ||
| ChannelType, | ||
| ConnectionManager<WsMessage, Arc<InterestTracker>>, | ||
| )> = self | ||
| .channels | ||
| .iter() | ||
| .map(|entry| (*entry.key(), entry.value().connection.clone())) | ||
| .collect(); |
There was a problem hiding this comment.
Prevent channels from being created during close
ClientInner::close takes a one-time snapshot of self.channels and only closes/removes those entries. Because subscription APIs can concurrently call get_or_create_channel, a new channel inserted after this snapshot is never visited, so close() may return success while leaving a live WebSocket channel running. This appears in multi-task usage where one task shuts down while another subscribes.
Useful? React with 👍 / 👎.
Summary
77264a4eab775fd0b094deb1c72575c712ab74fb.Client::close(timeout)for protocol-level half-closed socket cleanup during recycle windows.unsubscribe_last_trade_priceandunsubscribe_best_bid_askaliases through the existing market refcount path.MonkeyMoney bridge context
This branch is a bounded P0-A1 leak-stop bridge. It is not the long-term Polymarket Rust client destination. Do not extend this V1 fork beyond the Gate A scope.
Gate A candidate:
9a7667a2ddce660245fd56bbf879f9b749f6aa55mm-p0-a1-gate-a-2026-05-08Verification
Local:
cargo +nightly-2025-11-24 fmt --all -- --checkcargo +1.88 clippy --all-targets --all-features -- -D warningscargo +1.88 clippy --all-targets -- -D warningscargo +1.88 test --features "clob ws" close_tests -- --nocapturecargo +1.88 test --all-featurescargo +1.88 testcargo +1.88 build --all-targets --all-featuresgit diff --checkFork CI:
cargo-sort: passfmt-lint: passbuild-test (ubuntu) with coverage: passbuild-test (macos): passbuild-test (windows): passGate A review notes
src/ws/connection.rsfor the narrow internal close control message. It exists because the connection task owns the split WebSocket stream; no public generic shutdown API or Issue WS reconnect loop burns 100% CPU: backoff resets on connect-then-immediate-RST, TLS certs re-parsed every attempt Polymarket/rs-clob-client#327 behavior was added.src/clob/ws/client.rsforClient::close(timeout), helper aliases, and the two new close/leak tests.