-
Notifications
You must be signed in to change notification settings - Fork 95
Closed
Description
Another bug associated with triggering tasks whilst the workflow is paused :(
This bug leads to some really bizarre behaviours:
- Multiple job submission commands get logged into a single
job.activity.logfile. jobfiles are missing from some submissions causing them to fail erroneously.
Reproducible Example
(credit Dave)
We must simulate job submission failure in order to reproduce, the easiest way to do this is to jigger the .bashrc file of a remote platform.
exit 1Or for a remote platform on a shared filesystem:
[[ $HOSTNAME == myremote ]] && exit 1Then run this workflow:
[scheduling]
[[graph]]
R1 = """
remote
operator
"""
[runtime]
[[remote]]
platform = myremote
[[operator]]
script = """
cylc workflow-state "${CYLC_WORKFLOW_ID}//1/remote/02:submit-failed"
cylc pause "${CYLC_WORKFLOW_ID}"
cylc trigger "${CYLC_WORKFLOW_ID}//1/remote"
sleep 1
cylc workflow-state "${CYLC_WORKFLOW_ID}//1/remote/02:submit-failed"
cylc trigger "${CYLC_WORKFLOW_ID}//1/remote"
sleep 1
cylc workflow-state "${CYLC_WORKFLOW_ID}//1/remote/02:submit-failed"
cylc trigger "${CYLC_WORKFLOW_ID}//1/remote"
"""Submission 01 goes as expected, but things start going terribly wrong afterwards.
Most critically, this exception appears in the scheduler log:
ERROR - [Errno 2] No such file or directory: '.../job/1/remote/03/job'
Traceback (most recent call last):
File ".../cylc/flow/scheduler_cli.py", line 719, in cylc_play
asyncio.get_running_loop()
RuntimeError: no running event loop
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".../cylc/flow/subprocpool.py", line 495, in _run_command_init
stdin_file = open( # noqa: SIM115
^^^^^^^^^^^^^^^^^^^^^
Curiously, change remote to remote:submit-failed? and this exception goes away, so an interaction involving TaskProxy state is likely.
matthewrmshin
Metadata
Metadata
Assignees
Labels
bugSomething is wrong :(Something is wrong :(