Skip to content

job submission: No such file or directory .../jobΒ #7016

@oliver-sanders

Description

@oliver-sanders

Another bug associated with triggering tasks whilst the workflow is paused :(

This bug leads to some really bizarre behaviours:

  • Multiple job submission commands get logged into a single job.activity.log file.
  • job files are missing from some submissions causing them to fail erroneously.

Reproducible Example

(credit Dave)

We must simulate job submission failure in order to reproduce, the easiest way to do this is to jigger the .bashrc file of a remote platform.

exit 1

Or for a remote platform on a shared filesystem:

[[ $HOSTNAME == myremote ]] && exit 1

Then run this workflow:

[scheduling]
    [[graph]]
        R1 = """
            remote
            operator
        """

[runtime]
    [[remote]]
        platform = myremote

    [[operator]]
        script = """
            cylc workflow-state "${CYLC_WORKFLOW_ID}//1/remote/02:submit-failed"
            cylc pause "${CYLC_WORKFLOW_ID}"
            cylc trigger "${CYLC_WORKFLOW_ID}//1/remote"

            sleep 1
            cylc workflow-state "${CYLC_WORKFLOW_ID}//1/remote/02:submit-failed"
            cylc trigger "${CYLC_WORKFLOW_ID}//1/remote"

            sleep 1
            cylc workflow-state "${CYLC_WORKFLOW_ID}//1/remote/02:submit-failed"
            cylc trigger "${CYLC_WORKFLOW_ID}//1/remote"
        """

Submission 01 goes as expected, but things start going terribly wrong afterwards.

Most critically, this exception appears in the scheduler log:

ERROR - [Errno 2] No such file or directory: '.../job/1/remote/03/job'
    Traceback (most recent call last):
      File ".../cylc/flow/scheduler_cli.py", line 719, in cylc_play
        asyncio.get_running_loop()
    RuntimeError: no running event loop
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File ".../cylc/flow/subprocpool.py", line 495, in _run_command_init
        stdin_file = open(  # noqa: SIM115
                     ^^^^^^^^^^^^^^^^^^^^^

Curiously, change remote to remote:submit-failed? and this exception goes away, so an interaction involving TaskProxy state is likely.

Metadata

Metadata

Labels

bugSomething is wrong :(

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions