Skip to content

Handle stalled threaded queued-run launches without resetting workers#33678

Draft
zoltanmaric wants to merge 1 commit intodagster-io:masterfrom
zoltanmaric:zoltanmaric/queued-run-daemon-stall-recovery
Draft

Handle stalled threaded queued-run launches without resetting workers#33678
zoltanmaric wants to merge 1 commit intodagster-io:masterfrom
zoltanmaric:zoltanmaric/queued-run-daemon-stall-recovery

Conversation

@zoltanmaric
Copy link
Copy Markdown

Summary

  • keep threaded dequeue launches in flight across iterations instead of restarting the worker pool when one launch stalls
  • stop waiting after a bounded no-progress window so the daemon can keep heartbeating and later iterations can use any free dequeue capacity
  • add a regression test covering a blocked launch followed by a later queued run

Test plan

  • make ruff
  • pytest python_modules/dagster/dagster_tests/daemon_tests/test_queued_run_coordinator_daemon.py -q
  • pyright python_modules/dagster/dagster/_daemon/run_coordinator/queued_run_coordinator_daemon.py python_modules/dagster/dagster_tests/daemon_tests/test_queued_run_coordinator_daemon.py

Made with Cursor

Track in-flight launch futures across iterations and stop waiting after a bounded no-progress window so the queued run daemon can keep heartbeating and continue dequeuing later runs without tearing down its worker pool.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant