Skip to content

Conversation

@cademirch
Copy link
Contributor

@cademirch cademirch commented Jun 6, 2025

Based on some of the work in https://github.com/harvardinformatics/snakemake-executor-plugin-cannon, I've put together this PR to add dynamic partition selection based on job resource requirements. Relevant discussion in #106

This works by having the user provide a YML file that specifies their cluster's partitions and the resource limits of each via option --slurm-partition-config. The expected structure of the file is simple:

partitions:
  some_partition:
    max_runtime: 100
  another_partition:
    ...

I realize adding another config file isn't ideal given that workflow specific configs, workflow profiles, and global profiles already exist. But this approach avoids the complexity of having to determine cluster configurations through SLURM commands, which can vary wildly (as discussed in #106). It also sidesteps the need for complex expressions in set-resources to determine partitions, which can get unwieldy for big workflows with lots of rules. Ideally, users can craft a partitions.yml once and set it in their global profile.

I'm not super familiar with SLURM, and partition resource limits, so I came up with a list based on the Snakemake standard resources and SLURM resources:

The following limits can be defined for each partition:

Parameter Type Description Default
max_runtime int Maximum walltime in minutes unlimited
max_mem_mb int Maximum total memory in MB unlimited
max_mem_mb_per_cpu int Maximum memory per CPU in MB unlimited
max_cpus_per_task int Maximum CPUs per task unlimited
max_nodes int Maximum number of nodes unlimited
max_tasks int Maximum number of tasks unlimited
max_tasks_per_node int Maximum tasks per node unlimited
max_gpu int Maximum number of GPUs 0
available_gpu_models list[str] List of available GPU models none
max_cpus_per_gpu int Maximum CPUs per GPU unlimited
supports_mpi bool Whether MPI jobs are supported true
max_mpi_tasks int Maximum MPI tasks unlimited
available_constraints list[str] List of available node constraints none

It could be possible to support any arbitrary resource though, by pattern matching: "max_{resource}". Though I've not implemented this in this PR.

To pick the "best" partition for a job, I went with a naive scoring approach that calculates a score by summing the ratios of requested resources to partition limits. Higher scores should indicate better resource utilization, for example: a job requesting 8 CPUs would prefer a 16-CPU partition (score 0.5) over a 64-CPU partition (score 0.125). Partitions that cannot satisfy a job's requirements are not considered.

This feature is opt in and respects the slurm_partition job resource, as well as existing fallback partition logic.

@cmeesters, @johanneskoester let me know what you think of this approach! I'm not particularly experienced with SLURM, so I've made decisions here (limits, partition specs, etc.) based on my limited experience and the available docs - so feedback is much appreciated.

Summary by CodeRabbit

  • New Features

    • Automatic SLURM partition selection that picks the best-fit partition per job, honors explicit per-job partition overrides, and falls back to the default partition when no match is found.
    • New configuration option to provide a YAML file of partition limits for selection.
  • Documentation

    • Added detailed docs with YAML schema, example configs (standard/highmem/gpu), scoring behavior, and fallback rules.
  • Tests

    • Expanded test coverage for partition parsing and selection across CPU/GPU/MPI and edge cases.

@cademirch cademirch requested a review from johanneskoester June 6, 2025 22:23
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 6, 2025

Warning

Rate limit exceeded

@cmeesters has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 3febf2c and e652e69.

📒 Files selected for processing (1)
  • tests/tests.py (3 hunks)

Walkthrough

Adds YAML-driven automatic SLURM partition selection: new partition parsing/scoring logic, executor wiring to load/select partitions at init/submission, documentation, and extensive tests for partition parsing and selection.

Changes

Cohort / File(s) Change Summary
Documentation
docs/further.md
Adds "Automatic Partition Selection" docs: YAML schema, examples (standard/highmem/gpu), scoring rules, selection/fallback behavior.
Partition Management Module
snakemake_executor_plugin_slurm/partitions.py
New module defining PartitionLimits and Partition dataclasses, YAML reader (read_partition_file), GPU/CPU parsers (parse_gpu_requirements, get_job_cpu_requirement), partition scoring (score_job_fit), and selector (get_best_partition).
Executor Integration
snakemake_executor_plugin_slurm/__init__.py
Adds partition_config: Optional[Path] setting, initializes _partitions in __post_init__ via read_partition_file, registers atexit cleanup, and updates get_partition_arg to use get_best_partition when no explicit slurm_partition is set.
Test Suite
tests/tests.py
Adds tests and fixtures for partition YAML parsing and selection across CPU/GPU/MPI/GRES/constraint scenarios; includes helpers and many assertions (file contains duplicated/merge-conflict-like blocks to resolve).

Sequence Diagram(s)

sequenceDiagram
    participant Job as Job Submission
    participant Exec as Executor
    participant Parts as Partitions Module
    participant SLURM as SLURM

    Job->>Exec: run_job(job)
    Exec->>Exec: build job params (resources)
    alt partition_config provided
        Exec->>Parts: get_best_partition(candidate_partitions, job)
        Parts->>Parts: For each partition: score_job_fit(job)
        Parts-->>Exec: best_partition_name / None
        alt match found
            Exec->>Exec: set partition arg to best_partition_name
        else no match
            Exec->>Exec: fallback to default/explicit partition logic
        end
    else no config
        Exec->>Exec: use default/explicit partition logic
    end
    Exec->>SLURM: submit job with selected partition
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Pay extra attention to: partition scoring math and threshold logic; GPU parsing and mutually exclusive gpu vs gres handling; cpus_per_gpu and derived CPU calculations; integration in __init__.py (initialization order, atexit hook); and the new tests (resolve duplicate/merge-conflict blocks).

Possibly related PRs

Suggested reviewers

  • johanneskoester
  • dlaehnemann

Poem

🐰 I nibble YAML leaves at dawn,

I count the GPUs, memory drawn,
I score each partition, pick the best,
Hop to SLURM and tuck jobs to rest,
Happy hops — the pipeline's on! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat!: proposal for dynamic partition selection" directly and specifically refers to the main change in the changeset, which is the introduction of dynamic SLURM partition selection based on job resource requirements via a YAML configuration file. The core changes across all modified files—documentation in docs/further.md, implementation in init.py and the new partitions.py module, and comprehensive tests—all center on this feature. The title is concise, clear, and specific enough that a teammate scanning the git history would understand the primary change without ambiguity. The use of "proposal" appropriately reflects the tentative nature discussed in the PR objectives regarding potential future enhancements, but does not render the title vague or generic.
Docstring Coverage ✅ Passed Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cademirch cademirch requested a review from cmeesters June 6, 2025 22:23
@cademirch cademirch changed the title feat proposal for automatic partition selection feat: proposal for automatic partition selection Jun 6, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
snakemake_executor_plugin_slurm/partitions.py (3)

44-53: Use appropriate logging levels.

The function uses logger.warning for informational messages about partition selection. These are normal operational messages, not warnings.

-        logger.warning(
+        logger.info(
             f"Auto-selected partition '{partition}' for job {job.name} "
             f"with score {best_score:.3f}"
         )
         return partition
     else:
         logger.warning(
             f"No suitable partition found for job {job.name} based on "
             f"resource requirements. Falling back to default behavior."
         )

248-250: Simplify nested if statements.

As suggested by static analysis, these nested if statements can be combined for better readability.

-            if gpu_model and self.limits.available_gpu_models:
-                if gpu_model not in self.limits.available_gpu_models:
-                    return None
+            if gpu_model and self.limits.available_gpu_models and gpu_model not in self.limits.available_gpu_models:
+                return None
🧰 Tools
🪛 Ruff (0.11.9)

248-249: Use a single if statement instead of nested if statements

(SIM102)


184-223: Consider documenting and refining the scoring algorithm.

The current scoring algorithm simply sums the ratios of requested resources to limits, giving equal weight to all resources. This may not reflect real-world partition selection priorities.

Consider:

  1. Adding weights to different resources based on their importance
  2. Documenting the scoring rationale in the method docstring
  3. Making the scoring configurable through the partition configuration

For example, memory and GPU availability might be more critical than other factors in partition selection.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 184-184: Too many return statements (8/6)

(R0911)


[refactor] 184-184: Too many branches (21/12)

(R0912)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec82a70 and f5511ff.

📒 Files selected for processing (5)
  • docs/further.md (1 hunks)
  • snakemake_executor_plugin_slurm/__init__.py (6 hunks)
  • snakemake_executor_plugin_slurm/partitions.py (1 hunks)
  • snakemake_executor_plugin_slurm/submit_string.py (0 hunks)
  • tests/tests.py (2 hunks)
💤 Files with no reviewable changes (1)
  • snakemake_executor_plugin_slurm/submit_string.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
snakemake_executor_plugin_slurm/__init__.py (1)
snakemake_executor_plugin_slurm/partitions.py (2)
  • read_partition_file (13-27)
  • get_best_partition (30-54)
tests/tests.py (4)
snakemake_executor_plugin_slurm/__init__.py (1)
  • ExecutorSettings (36-119)
snakemake_executor_plugin_slurm/utils.py (1)
  • set_gres_string (51-123)
snakemake_executor_plugin_slurm/submit_string.py (1)
  • get_submit_command (5-69)
snakemake_executor_plugin_slurm/partitions.py (2)
  • read_partition_file (13-27)
  • get_best_partition (30-54)
🪛 Pylint (3.3.7)
tests/tests.py

[refactor] 485-485: Too many public methods (28/20)

(R0904)

snakemake_executor_plugin_slurm/partitions.py

[refactor] 41-54: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 78-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[refactor] 85-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[refactor] 115-144: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[refactor] 93-93: Too many branches (14/12)

(R0912)


[refactor] 150-150: Too many instance attributes (13/7)

(R0902)


[refactor] 184-184: Too many return statements (8/6)

(R0911)


[refactor] 184-184: Too many branches (21/12)

(R0912)

🪛 Ruff (0.11.9)
snakemake_executor_plugin_slurm/partitions.py

248-249: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing
🔇 Additional comments (4)
docs/further.md (1)

67-139: Well-documented feature with clear examples!

The documentation for the automatic partition selection feature is comprehensive and well-structured. The table of configurable limits, example YAML configuration, and explanation of the scoring algorithm provide users with all the necessary information to effectively use this feature.

tests/tests.py (1)

485-974: Excellent test coverage for partition selection feature!

The test suite comprehensively covers all aspects of the partition selection feature:

  • Various partition configuration scenarios (minimal, comprehensive, invalid)
  • Edge cases (empty partitions, missing keys, invalid YAML)
  • Different job resource requirements (CPU, GPU, MPI, constraints)
  • Proper error handling and logging verification

The use of pytest fixtures and parametrization makes the tests maintainable and easy to extend.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 485-485: Too many public methods (28/20)

(R0904)

snakemake_executor_plugin_slurm/__init__.py (1)

639-652: Well-implemented partition selection priority logic!

The enhanced get_partition_arg method correctly implements the partition selection priority:

  1. Explicit partition specification (via slurm_partition resource)
  2. Automatic selection based on loaded partition config
  3. Fallback to default partition behavior

This maintains backward compatibility while adding the new automatic selection feature.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 649-652: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

snakemake_executor_plugin_slurm/partitions.py (1)

149-175: Well-designed dataclass for partition limits.

The PartitionLimits dataclass appropriately models SLURM partition resource constraints with sensible defaults (infinity for unlimited, 0 for GPUs).

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 150-150: Too many instance attributes (13/7)

(R0902)

Comment on lines +125 to +126
if cpus_per_task < 0:
return (0, "none")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider raising an error for negative CPU values.

When cpus_per_task is negative, the function returns (0, "none"). This silently ignores an invalid configuration that likely indicates a user error.

         if cpus_per_task < 0:
-            return (0, "none")
+            raise WorkflowError(f"Invalid negative value for cpus_per_task: {cpus_per_task}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if cpus_per_task < 0:
return (0, "none")
if cpus_per_task < 0:
raise WorkflowError(f"Invalid negative value for cpus_per_task: {cpus_per_task}")
🤖 Prompt for AI Agents
In snakemake_executor_plugin_slurm/partitions.py around lines 125 to 126, the
function currently returns (0, "none") when cpus_per_task is negative, which
silently ignores invalid input. Instead, raise a ValueError with a clear message
indicating that cpus_per_task cannot be negative to properly signal the
configuration error.

Comment on lines +81 to +89
# Parse gres string format: gpu:<number> or gpu:<model>:<number>
gpu_parts = [part for part in gres.split(",") if part.strip().startswith("gpu")]
if gpu_parts:
gpu_spec = gpu_parts[0].strip().split(":")
if len(gpu_spec) == 2: # gpu:<number>
return int(gpu_spec[1]), None
elif len(gpu_spec) == 3: # gpu:<model>:<number>
return int(gpu_spec[2]), gpu_spec[1]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for gres GPU parsing.

The gres parsing logic has potential issues:

  1. Integer conversion could fail if the number part isn't numeric
  2. Only the first GPU specification is processed from comma-separated values
  3. No validation for unexpected formats
         # Parse gres string format: gpu:<number> or gpu:<model>:<number>
         gpu_parts = [part for part in gres.split(",") if part.strip().startswith("gpu")]
         if gpu_parts:
             gpu_spec = gpu_parts[0].strip().split(":")
             if len(gpu_spec) == 2:  # gpu:<number>
-                return int(gpu_spec[1]), None
+                try:
+                    return int(gpu_spec[1]), None
+                except ValueError:
+                    raise WorkflowError(f"Invalid GPU count in gres: {gpu_spec[1]}")
             elif len(gpu_spec) == 3:  # gpu:<model>:<number>
-                return int(gpu_spec[2]), gpu_spec[1]
+                try:
+                    return int(gpu_spec[2]), gpu_spec[1]
+                except ValueError:
+                    raise WorkflowError(f"Invalid GPU count in gres: {gpu_spec[2]}")
+            else:
+                raise WorkflowError(f"Invalid gres GPU format: {gpu_parts[0]}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Parse gres string format: gpu:<number> or gpu:<model>:<number>
gpu_parts = [part for part in gres.split(",") if part.strip().startswith("gpu")]
if gpu_parts:
gpu_spec = gpu_parts[0].strip().split(":")
if len(gpu_spec) == 2: # gpu:<number>
return int(gpu_spec[1]), None
elif len(gpu_spec) == 3: # gpu:<model>:<number>
return int(gpu_spec[2]), gpu_spec[1]
# Parse gres string format: gpu:<number> or gpu:<model>:<number>
gpu_parts = [part for part in gres.split(",") if part.strip().startswith("gpu")]
if gpu_parts:
gpu_spec = gpu_parts[0].strip().split(":")
if len(gpu_spec) == 2: # gpu:<number>
try:
return int(gpu_spec[1]), None
except ValueError:
raise WorkflowError(f"Invalid GPU count in gres: {gpu_spec[1]}")
elif len(gpu_spec) == 3: # gpu:<model>:<number>
try:
return int(gpu_spec[2]), gpu_spec[1]
except ValueError:
raise WorkflowError(f"Invalid GPU count in gres: {gpu_spec[2]}")
else:
raise WorkflowError(f"Invalid gres GPU format: {gpu_parts[0]}")
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 85-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

🤖 Prompt for AI Agents
In snakemake_executor_plugin_slurm/partitions.py around lines 81 to 89, the
current GPU gres parsing lacks error handling for non-numeric values during
integer conversion, processes only the first GPU spec ignoring others, and does
not validate unexpected formats. To fix this, add try-except blocks around
integer conversions to catch and handle ValueError, iterate over all GPU specs
in the comma-separated list instead of just the first, and include validation
logic to handle or report unexpected formats gracefully.

Comment on lines +13 to +27
def read_partition_file(partition_file: Path) -> List["Partition"]:
with open(partition_file, "r") as f:
out = []
partitions_dict = yaml.safe_load(f)["partitions"]
for partition_name, partition_config in partitions_dict.items():
if not partition_name or not partition_name.strip():
raise KeyError("Partition name cannot be empty")

out.append(
Partition(
name=partition_name,
limits=PartitionLimits(**partition_config),
)
)
return out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for file operations and YAML parsing.

The function lacks error handling for several potential failures:

  • File not found or permission errors
  • Invalid YAML syntax
  • Missing "partitions" key
  • Invalid partition configuration structure

Consider adding comprehensive error handling:

 def read_partition_file(partition_file: Path) -> List["Partition"]:
-    with open(partition_file, "r") as f:
-        out = []
-        partitions_dict = yaml.safe_load(f)["partitions"]
+    try:
+        with open(partition_file, "r") as f:
+            config = yaml.safe_load(f)
+    except FileNotFoundError:
+        raise WorkflowError(f"Partition configuration file not found: {partition_file}")
+    except yaml.YAMLError as e:
+        raise WorkflowError(f"Invalid YAML in partition configuration: {e}")
+    except Exception as e:
+        raise WorkflowError(f"Error reading partition configuration: {e}")
+    
+    if not isinstance(config, dict) or "partitions" not in config:
+        raise WorkflowError("Partition configuration must contain a 'partitions' key")
+    
+    partitions_dict = config["partitions"]
+    if not isinstance(partitions_dict, dict):
+        raise WorkflowError("'partitions' must be a dictionary")
+    
+    out = []
         for partition_name, partition_config in partitions_dict.items():
             if not partition_name or not partition_name.strip():
                 raise KeyError("Partition name cannot be empty")
-
+            
+            if not isinstance(partition_config, dict):
+                raise WorkflowError(f"Configuration for partition '{partition_name}' must be a dictionary")
+            
             out.append(
                 Partition(
                     name=partition_name,
                     limits=PartitionLimits(**partition_config),
                 )
             )
         return out
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def read_partition_file(partition_file: Path) -> List["Partition"]:
with open(partition_file, "r") as f:
out = []
partitions_dict = yaml.safe_load(f)["partitions"]
for partition_name, partition_config in partitions_dict.items():
if not partition_name or not partition_name.strip():
raise KeyError("Partition name cannot be empty")
out.append(
Partition(
name=partition_name,
limits=PartitionLimits(**partition_config),
)
)
return out
def read_partition_file(partition_file: Path) -> List["Partition"]:
try:
with open(partition_file, "r") as f:
config = yaml.safe_load(f)
except FileNotFoundError:
raise WorkflowError(f"Partition configuration file not found: {partition_file}")
except yaml.YAMLError as e:
raise WorkflowError(f"Invalid YAML in partition configuration: {e}")
except Exception as e:
raise WorkflowError(f"Error reading partition configuration: {e}")
if not isinstance(config, dict) or "partitions" not in config:
raise WorkflowError("Partition configuration must contain a 'partitions' key")
partitions_dict = config["partitions"]
if not isinstance(partitions_dict, dict):
raise WorkflowError("'partitions' must be a dictionary")
out = []
for partition_name, partition_config in partitions_dict.items():
if not partition_name or not partition_name.strip():
raise KeyError("Partition name cannot be empty")
if not isinstance(partition_config, dict):
raise WorkflowError(
f"Configuration for partition '{partition_name}' must be a dictionary"
)
out.append(
Partition(
name=partition_name,
limits=PartitionLimits(**partition_config),
)
)
return out
🤖 Prompt for AI Agents
In snakemake_executor_plugin_slurm/partitions.py around lines 13 to 27, the
read_partition_file function lacks error handling for file I/O and YAML parsing
issues. Add try-except blocks to catch and handle exceptions such as
FileNotFoundError, PermissionError when opening the file, yaml.YAMLError for
invalid YAML syntax, KeyError for missing "partitions" key, and TypeError or
ValueError for invalid partition configuration structure. Provide clear error
messages or re-raise exceptions with context to improve robustness and
debuggability.

@cademirch cademirch changed the title feat: proposal for automatic partition selection feat: proposal for dynamic partition selection Jun 7, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
snakemake_executor_plugin_slurm/partitions.py (3)

13-27: Add error handling for file operations and YAML parsing.

The function lacks error handling for several potential failures as previously noted.


81-89: Add error handling for gres GPU parsing.

The gres parsing logic needs error handling for integer conversion failures and format validation as previously noted.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 85-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


125-126: Consider raising an error for negative CPU values.

When cpus_per_task is negative, raising an error would better signal the configuration issue.

🧹 Nitpick comments (2)
snakemake_executor_plugin_slurm/partitions.py (2)

248-250: Simplify nested if statements.

Combine the nested conditions for better readability.

-            if gpu_model and self.limits.available_gpu_models:
-                if gpu_model not in self.limits.available_gpu_models:
-                    return None
+            if gpu_model and self.limits.available_gpu_models and gpu_model not in self.limits.available_gpu_models:
+                return None
🧰 Tools
🪛 Ruff (0.11.9)

248-249: Use a single if statement instead of nested if statements

(SIM102)


190-223: The resource ratio scoring approach promotes efficient partition utilization.

The scoring algorithm effectively:

  • Favors partitions that closely match job requirements
  • Prevents small jobs from monopolizing large partitions
  • Handles unlimited resources gracefully with isinf checks

This should lead to better overall cluster utilization.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f5511ff and 01a94c6.

📒 Files selected for processing (3)
  • snakemake_executor_plugin_slurm/__init__.py (7 hunks)
  • snakemake_executor_plugin_slurm/partitions.py (1 hunks)
  • tests/tests.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake_executor_plugin_slurm/init.py
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/tests.py

[refactor] 486-486: Too many public methods (28/20)

(R0904)

snakemake_executor_plugin_slurm/partitions.py

[refactor] 41-54: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 78-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[refactor] 85-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[refactor] 115-144: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[refactor] 93-93: Too many branches (14/12)

(R0912)


[refactor] 150-150: Too many instance attributes (13/7)

(R0902)


[refactor] 184-184: Too many return statements (8/6)

(R0911)


[refactor] 184-184: Too many branches (21/12)

(R0912)

🪛 Ruff (0.11.9)
snakemake_executor_plugin_slurm/partitions.py

248-249: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing
🔇 Additional comments (4)
tests/tests.py (2)

6-9: LGTM! Appropriate imports for partition testing functionality.

The new imports support the comprehensive test suite for the partition selection feature.

Also applies to: 13-16


486-976: Excellent test coverage for the partition selection feature!

The test suite comprehensively covers:

  • Various partition configurations (basic, minimal, comprehensive)
  • Different job types (CPU-only, GPU, MPI)
  • Edge cases and error conditions
  • Partition scoring and selection logic

The tests ensure robust validation of the new automatic partition selection functionality.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 486-486: Too many public methods (28/20)

(R0904)

snakemake_executor_plugin_slurm/partitions.py (2)

30-55: Well-implemented partition selection logic!

The function correctly:

  • Filters out incompatible partitions
  • Selects the best-scoring partition
  • Provides informative logging for both success and failure cases
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 41-54: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


149-175: Well-structured partition limits dataclass!

The dataclass appropriately:

  • Uses inf for unlimited numerical resources
  • Defaults max_gpu to 0 for non-GPU partitions
  • Organizes resources by type (standard, SLURM-specific, GPU, MPI, constraints)
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 150-150: Too many instance attributes (13/7)

(R0902)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/tests.py (1)

547-550: Improve test case for missing required fields.

The current test uses an empty partition name "" but doesn't truly test missing required fields. Consider testing with a partition that completely lacks the name field.

 @pytest.fixture
 def missing_name_config(self):
     """Configuration with missing name field."""
-    return {"partitions": {"": {}}}  # Empty partition name
+    return {"partitions": {None: {}}}  # None as key to test missing name handling
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01a94c6 and bb8c471.

📒 Files selected for processing (1)
  • tests/tests.py (2 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/tests.py

[refactor] 486-486: Too many public methods (28/20)

(R0904)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing
🔇 Additional comments (6)
tests/tests.py (6)

6-8: LGTM! Necessary imports for partition testing.

The added imports for tempfile, yaml, and Path are appropriate for creating temporary configuration files and testing the YAML parsing functionality.


13-16: LGTM! Proper imports for partition functionality.

The imports for read_partition_file and get_best_partition are correctly importing the new partition selection functions being tested.


486-486: Comprehensive test coverage is appropriate despite static analysis warning.

The pylint warning about too many public methods (28/20) is a false positive for test classes. Comprehensive test suites naturally require many test methods to cover all scenarios, and this level of coverage is beneficial for the new partition selection feature.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 486-486: Too many public methods (28/20)

(R0904)


557-567: LGTM! Well-designed helper fixture for temporary files.

The temp_yaml_file fixture provides a clean way to create temporary YAML files for testing. The implementation properly handles file creation and returns the path for cleanup.


569-709: Excellent coverage of file reading scenarios.

The tests comprehensively cover various file reading scenarios including:

  • Valid partition configurations with different complexity levels
  • Invalid YAML syntax
  • Missing files and required keys
  • Proper cleanup of temporary files

The error handling validation ensures the partition parsing is robust.


734-970: Comprehensive partition selection test coverage.

The partition selection tests effectively validate the core functionality including:

  • CPU and GPU job matching
  • MPI support validation
  • Constraint compatibility checking
  • Resource limit enforcement
  • Edge cases and error scenarios

The tests properly verify both successful partition selection and appropriate handling when no suitable partition exists.

@cmeesters
Copy link
Member

@cademirch Hoi Cade,

this is most impressive and looks like a major step forward! Alas, I will be on a conference for the next week and this PR really needs to be thoroughly tested. So, please allow me a bit more time than usual.

On the plus side: I will add a '!' to the title, as this might be worth a major release.

@cmeesters cmeesters changed the title feat: proposal for dynamic partition selection feat!: proposal for dynamic partition selection Jun 8, 2025
@cademirch
Copy link
Contributor Author

cademirch commented Jun 8, 2025

@cademirch Hoi Cade,

this is most impressive and looks like a major step forward! Alas, I will be on a conference for the next week and this PR really needs to be thoroughly tested. So, please allow me a bit more time than usual.

On the plus side: I will add a '!' to the title, as this might be worth a major release.
Hi @cmeesters!

Of course, I implemented various tests (albeit with LLM assistance, though I looked them over carefully and they all mostly make sense). I expect there are likely use-cases and SLURM configurations I've overlooked - on which your feedback will be very helpful.

Enjoy your conference!

@cademirch
Copy link
Contributor Author

@cmeesters Gentle bump here, would be great to discuss this.

@cmeesters
Copy link
Member

cmeesters commented Jun 29, 2025

@cademirch I'm sorry - I forgot to notify you: I didn't have a cluster for the last week (cooling issues). Hopefully, by next week, we open again. Yet, I do not know how much time I will find: holidays are approaching. I really want to test this. It would be a wonderful addition!

update: still no working cluster for me. As I will be on holiday until mid-July, there will be little progress, I'm afraid. I want to attempt a first test, once the cluster is up again, but whether this leave me time to do something substantial, I do not know.

@johanneskoester
Copy link
Contributor

Sounds like a good idea in principle. I believe the same could be achieved by providing a python expression for slurm_partition as a default resource. But I understand that complex clusters might turn such an expression into a very complex thing. A thought: what about discussing the logic here with the dev(s) (@gwct, https://github.com/harvardinformatics/snakemake-executor-plugin-cannon) of the cannon plugin, which is a fork of the slurm plugin specific for the Harvard cannon cluster. As far as I know, it does auto-selection of partitions. Maybe the outcome could be that what is developed here could replace the cannon plugin or make it at least easier to maintain.

@cmeesters
Copy link
Member

regarding cannon: interesting. And sad, that they forked without trying to provide something upstream.

I was imaging to parse the partition definitions of SLURM. That is a nightmare, but feasible - no quick implementation on the horizon. The hardest part would be to get an idea of preferences. That a "parallel" partition is intended to receive jobs for a whole or multiple of nodes might be understandable for us, albeit not deducible from a name as "parallel" might have quite different translations. The same applies for "smp" (which is the limit? This can be part of the partition definitions or the qos) or "a100dl" (which kind of GPU applications is to run there?).

Perhaps, it is worth to invest time, compare outputs of certain cluster definitions (scontrol show partition ...) and sacctmgr commands for qos. I think one person alone will always develop in ignorance of other clusters. That could be part of our regular meetings. I am willing to invest some time in this. What do you guys think (@cademirch , @johanneskoester )?

@cademirch
Copy link
Contributor Author

cademirch commented Aug 22, 2025

A thought: what about discussing the logic here with the dev(s) (@gwct, https://github.com/harvardinformatics/snakemake-executor-plugin-cannon) of the cannon plugin

@gwct's cannon plugin is what inspired me to write this PR. Him and I have collaborated on other work, I'm sure he'll be happy to chat about this.

Perhaps, it is worth to invest time, compare outputs of certain cluster definitions (scontrol show partition ...) and sacctmgr commands for qos. I think one person alone will always develop in ignorance of other clusters. That could be part of our regular meetings

If I understand correctly, you are wanting to explore parsing partition limits from scontrol/sacctmgr? I'm open to this, but would defer to you on this idea since I have very little experience with SLURM clusters in general. In any case, I think some sort of manual way to specify cluster limits will be needed in case a user's cluster setup does not work with parsing limits from scontrol or the like.

@johanneskoester
Copy link
Contributor

I agree, both automatic and guided like proposed here makes sense.

@cmeesters
Copy link
Member

ok, let's focus here on the guided approach. I will be travelling, but might(!) find some time in between. Or else, next week.

@gwct
Copy link

gwct commented Aug 25, 2025

Hey there all,
Sorry for the silence, I've been traveling the past week.

The Cannon plugin was a quick hackathon project. While I definitely think it could be expanded upon, I didn't want to clutter up your PRs with something that has a lot of stuff hard coded for our cluster specifically.

Basically all is just a little decision tree in the get_partition_arg() function to assign partitions based first on memory requested, then CPUs and time. Of course if GPUs are required that takes precedence on our cluster since there is only one GPU partition. The rest of the additions are just helper functions in cannon.py.

As I think we all realize, a universal parser for something like scontrol would be ideal, but extremely difficult. Even in the cannon plugin, we ran into some inconsistencies, notably the only working way to specify GPUs was with slurm_extra, even though both gres: and gpu: seem to be possible options. I'm not sure what about our cluster makes this so, but this is probably a good example of one of the quirks we might run into when developing a general partition selector.

That being said, I think @cademirch's guided approach above, with a YAML file, is probably a good place to start, though still requires a lot of user involvement in generating the YAML file. Perhaps after that's in place, the next step would be to write a separate parsing function that generates the YAML file automatically from scontrol, that way we can just build on top of it and the user can still provide their own YAML if needed.

Happy to discuss more and open to help testing as well! I'll also try to take a better look at Cade's code in this PR, since it seems more robust than our simple decision tree.

@johanneskoester
Copy link
Contributor

That sounds like a reasonable plan @gwct. In particular, I like that the user/admin can then review the yaml that has been autogenerated from scontrol in order to refine or extend stuff.

@dlaehnemann
Copy link
Member

After reading through this discussion quickly, I agree that a config yaml setup as a first step makes sense. And automatic parsing would be great, but sounds hard.

One thing that came to mind, because I also use the lsf executor plugin, would be to make this more generic (maybe implement this in the snakemake-interface-executor-plugins)? Because there is a similar concept in LSF clusters, although it is not called a partition, but a queue. For example, see these docs:
https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=job-view-available-queues

And if this is to be a more generic functionality, one important thing to come up with would be generic terminology for this. Not sure, maybe something like node-set-configurations:, as these define node sets and their resource restrictions?

@cmeesters
Copy link
Member

hm, my plan was to work on this end of this week / early next, because it is already taking so long ...

However, if we choose to migrate this to another code base, this would not make sense, would it? @cademirch / @dlaehnemann ? Also, the LSF plugin, meanwhile is less feature rich and some concept do not translate 1:1. So, I wonder, if how hard this would be?

@dlaehnemann
Copy link
Member

I think @cademirch might also pick this up soon?

And I think this concept of a partition, qos, queue, or ... that describes a set of nodes with particular resources / resource constraints will be something that (m)any cluster system would have. So if we can just get this configured as a new section in the main profile config.yaml file, this configuration can be used in executor plugin. The docs for each plugin then just have to note to which system-specific variables (partition for slurm) the generic term (for example node-set-name) maps.

And if a cluster system has an easy way:tm: to determine all of the available node sets, this could be defined via implementing some function from the executor-plugin interface that "auto-fills this profile section", but can be overwritten by manually putting this section into the profile.

And another thought: having it in the profile would also allow for cluster-specific profiles generated by the respective admins for everybody working on that cluster...

@cmeesters
Copy link
Member

cmeesters commented Oct 29, 2025

Let me start at the bottom:

And another thought: having it in the profile would also allow for cluster-specific profiles generated by the respective admins for everybody working on that cluster...

I would not hope on that. Whilst you might be working on clusters dedicated to life science, not every HPC team will be so forthcoming. Any solution needs to focus user capabilities. (SCNR)

As for the different batch systems: Whilst some, if not most, concepts are translatable, not all are. Last I worked with LSF, there was no built-in support for GPUs. This changed, however, the concept is different to SLURM. The naming of features is different, too. Honestly, I do not see, that anyone can code a generic solution alone — we need to sit (online) together. There are just too many details to consider.

As for @cademirch PR - solving some merge conflicts and some testing, which I promised a while ago, this PR is ready to be merged. It is a tremendous step forward, and I would like [edit: ] to avoid further delays -- unless we really want to have a catch-all solution.

@cademirch
Copy link
Contributor Author

Thanks for the discussion @dlaehnemann and @cmeesters. While I like the idea of abstracting the partition selection to the executor interface - I worry that differences between batch systems makes this difficult as @cmeesters mentions, particularly the naming of partitions (queues in LSF). In any case we could revisit this idea even after this PR is merged.

Also, I have made progress towards parsing partition restrictions from the SLURM commands, and will push those changes shortly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

6-18: Import atexit before using it.

__post_init__ now calls atexit.register, but atexit isn’t imported, so executor initialization will crash with a NameError. Add the missing import alongside the other stdlib imports:

+import atexit
 import csv
 from io import StringIO
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb8c471 and daa202c.

📒 Files selected for processing (3)
  • docs/further.md (1 hunks)
  • snakemake_executor_plugin_slurm/__init__.py (6 hunks)
  • tests/tests.py (2 hunks)
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • docs/further.md
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • docs/further.md
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-09-09T15:55:55.669Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-09-09T15:55:55.669Z
Learning: In the Snakemake SLURM executor plugin, Snakemake submits itself as a job using the --wrap parameter in the sbatch command, rather than creating or submitting a separate job.sh script file. The actual implementation uses `call += f' --wrap="{exec_job}"'` where exec_job is the formatted snakemake execution command.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-24T12:41:31.837Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-02-18T14:53:09.504Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:29-31
Timestamp: 2025-02-18T14:53:09.504Z
Learning: The snakemake-executor-plugin-slurm tests require system dependencies (gcc, python3-dev) to build the datrie package, which is a dependency of snakemake.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-02-18T14:49:42.624Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/utils.py:75-100
Timestamp: 2025-02-18T14:49:42.624Z
Learning: In the Snakemake SLURM executor plugin, users must specify either `gres` or `gpus` for GPU resources, but not both simultaneously, as these are mutually exclusive ways to request GPU resources.

Applied to files:

  • docs/further.md
📚 Learning: 2025-02-18T14:40:27.064Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-02-18T14:40:27.064Z
Learning: In the Snakemake executor plugin for SLURM, the GPU resource is specified using the "gpus" (plural) resource name, not "gpu" (singular).

Applied to files:

  • docs/further.md
📚 Learning: 2025-03-10T13:34:59.270Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (both simple numeric values and model:number specifications) should be translated to the `--gpus` flag (plural form) to match SLURM's command line interface expectations.

Applied to files:

  • docs/further.md
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • docs/further.md
📚 Learning: 2025-03-10T13:34:59.270Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (even simple numeric ones) should be translated to the `--gpus` flag (plural) to match SLURM's expectations, not `--gpu` (singular).

Applied to files:

  • docs/further.md
🧬 Code graph analysis (2)
snakemake_executor_plugin_slurm/__init__.py (1)
snakemake_executor_plugin_slurm/partitions.py (2)
  • read_partition_file (13-27)
  • get_best_partition (30-54)
tests/tests.py (2)
snakemake_executor_plugin_slurm/__init__.py (1)
  • ExecutorSettings (41-150)
snakemake_executor_plugin_slurm/partitions.py (2)
  • read_partition_file (13-27)
  • get_best_partition (30-54)
🪛 Ruff (0.14.2)
snakemake_executor_plugin_slurm/__init__.py

210-210: Expected ':', found name

(invalid-syntax)


210-210: Expected ',', found '='

(invalid-syntax)


215-215: Expected ':', found name

(invalid-syntax)

tests/tests.py

12-12: Redefinition of unused Path from line 4

Remove definition: Path

(F811)

Comment on lines 115 to 122
partition_config: Optional[Path] = field(
default=None,
metadata={
"help": "Path to YAML file defining partition limits for dynamic "
"partition selection. When provided, jobs will be dynamically "
"assigned to the best-fitting partition based on "
"See documentation for complete list of available limits.",
efficiency_report: bool = field(
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Close the partition_config field definition and finish the help text.

As written, the metadata={... block never closes, so the module won’t even parse. At the same time the help string still stops mid‑sentence (previously flagged). Please close the dict/field and complete the description:

-        metadata={
-            "help": "Path to YAML file defining partition limits for dynamic "
-            "partition selection. When provided, jobs will be dynamically "
-            "assigned to the best-fitting partition based on "
-            "See documentation for complete list of available limits.",
+        metadata={
+            "help": "Path to YAML file defining partition limits for dynamic "
+            "partition selection. When provided, jobs will be dynamically "
+            "assigned to the best-fitting partition based on their resource "
+            "requirements. See documentation for complete list of available limits.",
         },
     )
🧰 Tools
🪛 Ruff (0.14.2)

122-122: Expected ',', found '='

(invalid-syntax)

🤖 Prompt for AI Agents
In snakemake_executor_plugin_slurm/__init__.py around lines 115 to 122, the
partition_config field's metadata dict and field call are left unclosed and the
help string is incomplete; close the metadata dict and the field(...) call,
finish the help text to a full sentence (e.g. "based on resource requirements
and limits specified in the YAML file; see documentation for available
limits."), and ensure proper comma/paren balancing so the module parses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai - that issue should be fixed. Isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Extract the partition_config field definition to verify its current state
sed -n '114,122p' snakemake_executor_plugin_slurm/__init__.py

echo "---"
echo "Checking for Python syntax errors in the file:"
python -m py_compile snakemake_executor_plugin_slurm/__init__.py 2>&1 || echo "Syntax check complete"

Length of output: 370


🏁 Script executed:

#!/bin/bash
# Extract more lines to see the complete partition_config field definition
sed -n '114,125p' snakemake_executor_plugin_slurm/__init__.py

Length of output: 465


@cmeesters, good news and almost there! The structural issue is fixed ✓ – the field definition is now properly closed and the file compiles without syntax errors.

However, the help text is still incomplete. It currently reads:

"...assigned to the best-fitting partition based on "
"See documentation for complete list of available limits."

The phrase "based on " ends mid-thought. You need to complete it before moving to the next sentence. Here's a suggested fix:

metadata={
    "help": "Path to YAML file defining partition limits for dynamic "
    "partition selection. When provided, jobs will be dynamically "
    "assigned to the best-fitting partition based on their resource "
    "requirements. See documentation for complete list of available limits.",
},

Just add "their resource requirements. " (or similar phrasing) to bridge the gap between "based on" and "See documentation".


🧬 Code Graph Analysis Results
  • File: snakemake_executor_plugin_slurm/efficiency_report.py, lines 231-260
def create_efficiency_report(e_threshold, run_uuid, e_report_path, logger):
    """
    Fetch sacct job data for a Snakemake workflow
    and compute efficiency metrics.
    """
    lines = get_sacct_data(run_uuid, logger)

    if lines is None or not lines:
        return None

    df = parse_sacct_data(lines, e_threshold, run_uuid, logger)

    # we construct a path object to allow for a customi
    # logdir, if specified
    p = Path()

    # Save the report to a CSV file
    logfile = f"efficiency_report_{run_uuid}.csv"
    if e_report_path:
        logfile = Path(e_report_path) / logfile
    else:
        logfile = p.cwd() / logfile
    # ensure the directory exists
    logfile.parent.mkdir(parents=True, exist_ok=True)
    df.to_csv(logfile)

    # write out the efficiency report at normal verbosity in any case
    logger.info(f"Efficiency report for workflow {run_uuid} saved to {logfile}.")
    # state directory contents for debugging purposes
    logger.debug(f"Current directory contents in '{p.cwd()}': {os.listdir(p.cwd())}")
  • File: snakemake_executor_plugin_slurm/validation.py, lines 48-75
def validate_slurm_extra(job):
    """
    Validate that slurm_extra doesn't contain executor-managed options.

    Args:
        job: Snakemake job object with resources attribute

    Raises:
        WorkflowError: If forbidden SLURM options are found in slurm_extra
    """
    # skip testing if no slurm_extra is set
    slurm_extra = getattr(job.resources, "slurm_extra", None)
    if not slurm_extra:
        return

    forbidden_options = get_forbidden_slurm_options()

    for pattern, option_name in forbidden_options.items():
        if re.search(pattern, slurm_extra):
            raise WorkflowError(
                f"The --{option_name.replace(' ', '-')} option is not "
                f"allowed in the 'slurm_extra' parameter. "
                f"The {option_name} is set by the snakemake executor plugin "
                f"and must not be overwritten. "
                f"Please use the appropriate snakemake resource "
                f"specification instead. "
                f"Consult the documentation for proper resource configuration."
            )
  • File: snakemake_executor_plugin_slurm/partitions.py, lines 13-27
def read_partition_file(partition_file: Path) -> List["Partition"]:
    with open(partition_file, "r") as f:
        out = []
        partitions_dict = yaml.safe_load(f)["partitions"]
        for partition_name, partition_config in partitions_dict.items():
            if not partition_name or not partition_name.strip():
                raise KeyError("Partition name cannot be empty")

            out.append(
                Partition(
                    name=partition_name,
                    limits=PartitionLimits(**partition_config),
                )
            )
        return out
  • File: snakemake_executor_plugin_slurm/partitions.py, lines 30-54
def get_best_partition(
    candidate_partitions: List["Partition"],
    job: JobExecutorInterface,
    logger: LoggerExecutorInterface,
) -> Optional[str]:
    scored_partitions = [
        (p, score)
        for p in candidate_partitions
        if (score := p.score_job_fit(job)) is not None
    ]

    if scored_partitions:
        best_partition, best_score = max(scored_partitions, key=lambda x: x[1])
        partition = best_partition.name
        logger.warning(
            f"Auto-selected partition '{partition}' for job {job.name} "
            f"with score {best_score:.3f}"
        )
        return partition
    else:
        logger.warning(
            f"No suitable partition found for job {job.name} based on "
            f"resource requirements. Falling back to default behavior."
        )
        return None
  • File: snakemake_executor_plugin_slurm/utils.py, lines 13-23
def delete_slurm_environment():
    """
    Function to delete all environment variables
    starting with 'SLURM_'. The parent shell will
    still have this environment. This is needed to
    submit within a SLURM job context to avoid
    conflicting environments.
    """
    for var in os.environ:
        if var.startswith("SLURM_"):
            del os.environ[var]
  • File: snakemake_executor_plugin_slurm/utils.py, lines 26-48
def delete_empty_dirs(path: Path) -> None:
    """
    Function to delete all empty directories in a given path.
    This is needed to clean up the working directory after
    a job has sucessfully finished. This function is needed because
    the shutil.rmtree() function does not delete empty
    directories.
    """
    if not path.is_dir():
        return

    # Process subdirectories first (bottom-up)
    for child in path.iterdir():
        if child.is_dir():
            delete_empty_dirs(child)

    try:
        # Check if directory is now empty after processing children
        if not any(path.iterdir()):
            path.rmdir()
    except (OSError, FileNotFoundError) as e:
        # Provide more context in the error message
        raise OSError(f"Failed to remove empty directory {path}: {e}") from e
  • File: snakemake_executor_plugin_slurm/utils.py, lines 51-123
def set_gres_string(job: JobExecutorInterface) -> str:
    """
    Function to set the gres string for the SLURM job
    based on the resources requested in the job.
    """
    # generic resources (GRES) arguments can be of type
    # "string:int" or "string:string:int"
    gres_re = re.compile(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_]+)?:\d+$")
    # gpu model arguments can be of type "string"
    gpu_model_re = re.compile(r"^[a-zA-Z0-9_]+$")
    # any arguments should not start and end with ticks or
    # quotation marks:
    string_check = re.compile(r"^[^'\"].*[^'\"]$")
    # The Snakemake resources can be only be of type "int",
    # hence no further regex is needed.

    gpu_string = None
    if job.resources.get("gpu"):
        gpu_string = str(job.resources.get("gpu"))

    gpu_model = None
    if job.resources.get("gpu_model"):
        gpu_model = job.resources.get("gpu_model")

    # ensure that gres is not set, if gpu and gpu_model are set
    if job.resources.get("gres") and gpu_string:
        raise WorkflowError(
            "GRES and GPU are set. Please only set one of them.", rule=job.rule
        )
    elif not job.resources.get("gres") and not gpu_model and not gpu_string:
        return ""

    if job.resources.get("gres"):
        # Validate GRES format (e.g., "gpu:1", "gpu:tesla:2")
        gres = job.resources.gres
        if not gres_re.match(gres):
            if not string_check.match(gres):
                raise WorkflowError(
                    "GRES format should not be a nested string (start "
                    "and end with ticks or quotation marks). "
                    "Expected format: "
                    "'<name>:<number>' or '<name>:<type>:<number>' "
                    "(e.g., 'gpu:1' or 'gpu:tesla:2')"
                )
            else:
                raise WorkflowError(
                    f"Invalid GRES format: {gres}. Expected format: "
                    "'<name>:<number>' or '<name>:<type>:<number>' "
                    "(e.g., 'gpu:1' or 'gpu:tesla:2')"
                )
        return f" --gres={job.resources.gres}"

    if gpu_model and gpu_string:
        # validate GPU model format
        if not gpu_model_re.match(gpu_model):
            if not string_check.match(gpu_model):
                raise WorkflowError(
                    "GPU model format should not be a nested string (start "
                    "and end with ticks or quotation marks). "
                    "Expected format: '<name>' (e.g., 'tesla')"
                )
            else:
                raise WorkflowError(
                    f"Invalid GPU model format: {gpu_model}."
                    " Expected format: '<name>' (e.g., 'tesla')"
                )
        return f" --gpus={gpu_model}:{gpu_string}"
    elif gpu_model and not gpu_string:
        raise WorkflowError("GPU model is set, but no GPU number is given")
    elif gpu_string:
        # we assume here, that the validator ensures that the 'gpu_string'
        # is an integer
        return f" --gpus={gpu_string}"
  • File: snakemake_executor_plugin_slurm/submit_string.py, lines 19-130
def get_submit_command(job, params):
    """
    Return the submit command for the job.
    """
    # Convert params dict to a SimpleNamespace for attribute-style access
    params = SimpleNamespace(**params)

    call = (
        "sbatch "
        "--parsable "
        f"--job-name {safe_quote(params.run_uuid)} "
        f"--output {safe_quote(params.slurm_logfile)} "
        "--export=ALL "
        f"--comment {safe_quote(params.comment_str)}"
    )

    # No accout or partition checking is required, here.
    # Checking is done in the submit function.

    # here, only the string is used, as it already contains
    # "-A '{account_name}'"
    call += f" {params.account}"
    # here, only the string is used, as it already contains
    # "- p '{partition_name}'"
    call += f" {params.partition}"

    if job.resources.get("clusters"):
        call += f" --clusters {safe_quote(job.resources.clusters)}"

    if job.resources.get("runtime"):
        call += f" -t {safe_quote(job.resources.runtime)}"

    # Both, constraint and qos are optional.
    # If not set, they will not be added to the sbatch call.
    # If explicitly set to an empty string,
    # `--constraint ''` or `--qos ''` will be added.
    constraint = job.resources.get("constraint")
    if constraint is not None:
        call += f" -C {safe_quote(constraint)}"

    qos = job.resources.get("qos")
    if qos is not None:
        call += f" --qos={safe_quote(qos)}"

    if job.resources.get("mem_mb_per_cpu"):
        call += f" --mem-per-cpu {job.resources.mem_mb_per_cpu}"
    elif job.resources.get("mem_mb"):
        call += f" --mem {job.resources.mem_mb}"

    if job.resources.get("nodes", False):
        call += f" --nodes={job.resources.get('nodes', 1)}"

    gpu_job = job.resources.get("gpu") or "gpu" in job.resources.get("gres", "")
    if gpu_job:
        # fixes #316 - allow unsetting of tasks per gpu
        # apparently, python's internal process manangement interferes with SLURM
        # e.g. for pytorch
        ntasks_per_gpu = job.resources.get("tasks_per_gpu")
        if ntasks_per_gpu is None:
            ntasks_per_gpu = job.resources.get("tasks")
        if ntasks_per_gpu is None:
            ntasks_per_gpu = 1

        if ntasks_per_gpu >= 1:
            call += f" --ntasks-per-gpu={ntasks_per_gpu}"
    else:
        # fixes #40 - set ntasks regardless of mpi, because
        # SLURM v22.05 will require it for all jobs
        # if the job is a MPI job, ntasks will be set later
        if not job.resources.get("mpi", False):
            call += f" --ntasks={job.resources.get('tasks') or 1}"

    # if the job is an MPI job, we need to have some task setting:
    if job.resources.get("mpi", False):
        if not job.resources.get("tasks_per_node") and not job.resources.get("tasks"):
            raise WorkflowError(
                "For MPI jobs, please specify either "
                "'tasks_per_node' or 'tasks' (at least one is required)."
            )
        # raise an error if both task settings are used
        if job.resources.get("tasks_per_node") and job.resources.get("tasks"):
            raise WorkflowError(
                "For MPI jobs, please specify either 'tasks_per_node' or 'tasks', "
                "but not both."
            )
        if job.resources.get("tasks_per_node"):
            if job.resources.get("tasks_per_node") <= 1:
                raise WorkflowError(
                    "For MPI jobs, 'tasks_per_node' must be greater than 1."
                )
            call += f" --ntasks-per-node={job.resources.tasks_per_node}"
        elif job.resources.get("tasks"):
            if job.resources.get("tasks") == 1:
                raise WorkflowError("For MPI jobs, 'tasks' must be greater than 1.")
            call += f" --ntasks={job.resources.tasks}"
        # nodes CAN be set independently of tasks or tasks_per_node
        # this is at a user's discretion. The nodes flag might already
        # be set above, if the user specified it.

    # we need to set cpus-per-task OR cpus-per-gpu, the function
    # will return a string with the corresponding value
    call += f" {get_cpu_setting(job, gpu_job)}"
    if job.resources.get("slurm_extra"):
        call += f" {job.resources.slurm_extra}"

    # ensure that workdir is set correctly
    # use short argument as this is the same in all slurm versions
    # (see https://github.com/snakemake/snakemake/issues/2014)
    if params.workdir:
        call += f" -D {safe_quote(params.workdir)}"

    return call

🧠 Learnings used
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/tests.py (1)

21-1362: Resolve remaining merge-conflict markers before merging.

Line 21 and multiple hunks below still contain <<<<<<<, =======, >>>>>>> markers, leaving half-applied changes in place. The test module cannot even import in this state, so please finish the merge by selecting the intended code and removing all conflict markers before proceeding.

♻️ Duplicate comments (2)
tests/tests.py (1)

12-12: Remove the duplicate Path import to satisfy Ruff.

Line 12 re-imports Path, which Ruff flags as F811 (“redefinition of unused name”). Drop this second import so the tests module passes lint.

-import yaml
-from pathlib import Path
+import yaml
snakemake_executor_plugin_slurm/__init__.py (1)

117-123: Close the partition_config field definition and finish the help text.

Line 119 opens the metadata={... block but never closes it, leaving the field(...) call unterminated and breaking the module with a SyntaxError. While fixing that, complete the help sentence so users understand the behavior.

     partition_config: Optional[Path] = field(
         default=None,
         metadata={
             "help": "Path to YAML file defining partition limits for dynamic "
             "partition selection. When provided, jobs will be dynamically "
-            "assigned to the best-fitting partition based on "
-            "See documentation for complete list of available limits.",
+            "assigned to the best-fitting partition based on their resource "
+            "requirements. See documentation for the complete list of available limits.",
+        },
     )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between daa202c and 0d95537.

📒 Files selected for processing (3)
  • docs/further.md (1 hunks)
  • snakemake_executor_plugin_slurm/__init__.py (5 hunks)
  • tests/tests.py (5 hunks)
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • tests/tests.py
  • docs/further.md
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • tests/tests.py
  • docs/further.md
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • tests/tests.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • tests/tests.py
  • docs/further.md
📚 Learning: 2025-06-24T12:41:31.837Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-02-18T14:53:09.504Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:29-31
Timestamp: 2025-02-18T14:53:09.504Z
Learning: The snakemake-executor-plugin-slurm tests require system dependencies (gcc, python3-dev) to build the datrie package, which is a dependency of snakemake.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-06-03T17:10:51.292Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-09-15T12:00:31.977Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#355
File: snakemake_executor_plugin_slurm/submit_string.py:23-30
Timestamp: 2025-09-15T12:00:31.977Z
Learning: In the snakemake-executor-plugin-slurm project, the parameters params.run_uuid, params.slurm_logfile, and params.comment_str are always provided and never None, so validation for None values is not necessary for these specific parameters in the get_submit_command function.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-02-18T14:49:42.624Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/utils.py:75-100
Timestamp: 2025-02-18T14:49:42.624Z
Learning: In the Snakemake SLURM executor plugin, users must specify either `gres` or `gpus` for GPU resources, but not both simultaneously, as these are mutually exclusive ways to request GPU resources.

Applied to files:

  • tests/tests.py
  • docs/further.md
📚 Learning: 2025-02-18T14:40:27.064Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-02-18T14:40:27.064Z
Learning: In the Snakemake executor plugin for SLURM, the GPU resource is specified using the "gpus" (plural) resource name, not "gpu" (singular).

Applied to files:

  • docs/further.md
📚 Learning: 2025-03-10T13:34:59.270Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (both simple numeric values and model:number specifications) should be translated to the `--gpus` flag (plural form) to match SLURM's command line interface expectations.

Applied to files:

  • docs/further.md
📚 Learning: 2025-03-10T13:34:59.270Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (even simple numeric ones) should be translated to the `--gpus` flag (plural) to match SLURM's expectations, not `--gpu` (singular).

Applied to files:

  • docs/further.md
🧬 Code graph analysis (2)
snakemake_executor_plugin_slurm/__init__.py (1)
snakemake_executor_plugin_slurm/partitions.py (2)
  • read_partition_file (13-27)
  • get_best_partition (30-54)
tests/tests.py (2)
snakemake_executor_plugin_slurm/partitions.py (2)
  • read_partition_file (13-27)
  • get_best_partition (30-54)
snakemake_executor_plugin_slurm/validation.py (1)
  • validate_slurm_extra (48-75)
🪛 Ruff (0.14.2)
snakemake_executor_plugin_slurm/__init__.py

212-212: Expected ':', found name

(invalid-syntax)


212-212: Expected ',', found '='

(invalid-syntax)


217-217: Expected ':', found name

(invalid-syntax)

tests/tests.py

21-21: Expected a statement

(invalid-syntax)


21-21: Expected a statement

(invalid-syntax)


21-21: Expected a statement

(invalid-syntax)


21-21: Expected a statement

(invalid-syntax)


26-26: Expected a statement

(invalid-syntax)


26-26: Expected a statement

(invalid-syntax)


26-26: Expected a statement

(invalid-syntax)


26-26: Expected a statement

(invalid-syntax)


26-27: Expected a statement

(invalid-syntax)


28-28: Expected a statement

(invalid-syntax)


28-28: Expected a statement

(invalid-syntax)


28-28: Expected a statement

(invalid-syntax)


28-28: Expected a statement

(invalid-syntax)


801-801: Expected a statement

(invalid-syntax)


801-801: Expected a statement

(invalid-syntax)


801-801: Expected a statement

(invalid-syntax)


801-801: Expected a statement

(invalid-syntax)


1041-1041: Expected a statement

(invalid-syntax)


1041-1041: Expected a statement

(invalid-syntax)


1041-1041: Expected a statement

(invalid-syntax)


1041-1041: Expected a statement

(invalid-syntax)


1065-1065: Expected a statement

(invalid-syntax)


1065-1065: Expected a statement

(invalid-syntax)


1065-1065: Expected a statement

(invalid-syntax)


1065-1065: Expected a statement

(invalid-syntax)


1066-1066: Unexpected indentation

(invalid-syntax)


1068-1068: unindent does not match any outer indentation level

(invalid-syntax)


1070-1070: Expected a statement

(invalid-syntax)


1070-1070: Expected a statement

(invalid-syntax)


1070-1070: Expected a statement

(invalid-syntax)


1070-1070: Expected a statement

(invalid-syntax)


1071-1071: Unexpected indentation

(invalid-syntax)


1313-1313: Expected a statement

(invalid-syntax)


1313-1313: Expected a statement

(invalid-syntax)


1313-1313: Expected a statement

(invalid-syntax)


1313-1313: Expected a statement

(invalid-syntax)


1362-1362: Expected a statement

(invalid-syntax)


1362-1362: Expected a statement

(invalid-syntax)


1362-1362: Expected a statement

(invalid-syntax)


1362-1362: Expected a statement

(invalid-syntax)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing

@cmeesters
Copy link
Member

@cademirch I made an attempt to remove the merge conflicts. Alas, preparing to be tested took me way longer than anticipated. And now, I hope, I did not screw it up (removed code accidentally).

Please also note, that I added two missing words to me last post. I hope, I can continue testing this afternoon.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

121-124: Finish the partition_config help string.

The new help text still stops mid-sentence (“based on ”) and never tells users what criteria the auto-selection uses. This was pointed out in the prior review and is still unresolved, so please complete the sentence (for example, “…based on their resource requirements.”).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d95537 and 3febf2c.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/__init__.py (6 hunks)
  • tests/tests.py (3 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • tests/tests.py
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • tests/tests.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-20T09:13:26.443Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-24T12:41:31.837Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • tests/tests.py
📚 Learning: 2025-01-17T17:27:32.446Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#187
File: .github/workflows/post_to_mastodon.yml:0-0
Timestamp: 2025-01-17T17:27:32.446Z
Learning: In the mastodon publishing workflow for snakemake-executor-plugin-slurm, the PR_TITLE environment variable is required by the post_to_mastodon.sh script and should be preserved.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-03T17:10:51.292Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • tests/tests.py
📚 Learning: 2025-02-18T14:53:09.504Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:29-31
Timestamp: 2025-02-18T14:53:09.504Z
Learning: The snakemake-executor-plugin-slurm tests require system dependencies (gcc, python3-dev) to build the datrie package, which is a dependency of snakemake.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-09-15T12:00:31.977Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#355
File: snakemake_executor_plugin_slurm/submit_string.py:23-30
Timestamp: 2025-09-15T12:00:31.977Z
Learning: In the snakemake-executor-plugin-slurm project, the parameters params.run_uuid, params.slurm_logfile, and params.comment_str are always provided and never None, so validation for None values is not necessary for these specific parameters in the get_submit_command function.

Applied to files:

  • tests/tests.py
🧬 Code graph analysis (2)
snakemake_executor_plugin_slurm/__init__.py (1)
snakemake_executor_plugin_slurm/partitions.py (2)
  • read_partition_file (13-27)
  • get_best_partition (30-54)
tests/tests.py (1)
snakemake_executor_plugin_slurm/partitions.py (2)
  • read_partition_file (13-27)
  • get_best_partition (30-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing

@cmeesters
Copy link
Member

@cademirch do you want to add something to the PR? Else, I will test once more and merge.

Thinking about it, I want an env variable similar to $SNAKEMAKE_PROFILE as an alternative to the command line flag. If you agree, I will implement this too. We as this can be a fully qualified path to a file, we could recommend placing it next to the config.yaml in the profile path. We could even let the config.yaml there point to the partition profile. Opinions?

If in favour, how about naming it $SNAKEMAKE_CLUSTER_PROFILE? @johanneskoester

@cademirch
Copy link
Contributor Author

@cademirch do you want to add something to the PR? Else, I will test once more and merge.

@cmeesters. Sorry for the delay - lets just merge as is. It is going to take me longer than I thought to integrate the auto detection of the partition limits, not to mention testing it. Since this works as is now, we should merge it, and I can revisit the auto detection later.

Thinking about it, I want an env variable similar to $SNAKEMAKE_PROFILE as an alternative to the command line flag. If you agree, I will implement this too. We as this can be a fully qualified path to a file, we could recommend placing it next to the config.yaml in the profile path. We could even let the config.yaml there point to the partition profile. Opinions?

If in favour, how about naming it $SNAKEMAKE_CLUSTER_PROFILE? @johanneskoester

I worry $SNAKEMAKE_CLUSTER_PROFILE is too similar to the old cluster profile setup and may be confusing. Perhaps something like SNAKEMAKE_SLURM_PARTITIONS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants