Skip to content

clarify implementation of --seconds-between-status-checks and --max-status-checks-per-seconΒ #319

@dlaehnemann

Description

@dlaehnemann

Both @cmeesters and I currently aren't 100% sure how this works and we should thus investigate in some more detail, to make sure this works as intended and is thoroughly documented. Here's previous discussion of this on a docs pull request:

@dlaehnemann I removed the TODO items from the status check section (see: 4cfac88): the plugin uses the build-in rate limiter (see the main code, in this pr line 400 async with self.status_rate_limiter:), this obeys the --seconds-between-status-checks and --max-status-checks-per-second options, does it not?

That status_rate_limiter just uses the --max-status-checks-per-second option, see here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L90

And this will only limit how many sacct commands are issued per second within a round of status check attempts. So the default for this should probably rather be some lower fraction like 0.2, to make it wait 5 seconds before re-attempting the sacct.

But from what I can see in the snakemake code, the default is actually 100.0:
https://github.com/snakemake/snakemake/blob/91cc3ef0f24b1b0e96cb751cdc806cad84491425/src/snakemake/settings/types.py#L420

Or 10, if we trust the rendered CLI docs:
https://snakemake.readthedocs.io/en/stable/executing/cli.html#snakemake.cli-get_argument_parser-behavior

Both of these would be rapid firing sacct at the slurm system as many times as you attempt a status check. So maybe instead put something like max_status_checks_per_second=0.2 here:

As for the --seconds-between-status-checks, I think this is initialized here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L280

And we update it here, resetting it to 40 seconds whenever any jobs are finished:

So I think we should at least reset it to the command-line value. And we should double-check that simply setting this value will have the intended effect of waiting. Because I don't understand the code calling this in the interface. There's first the sleep function, that uses this:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L269

And this sleep function is in turn only used here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L191

And eventually it is used here, within the threading.Thread() call:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L83

So I guess this somehow uses the threading core python module to wait for jobs and update the active_jobs list, but I can't seem to wrap my head around how it does this. But it probably works as intended and I simply don't understand it... πŸ˜…

Originally posted by @dlaehnemann in #237 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions