Skip to content

srun: fatal: SLURM_TRES_PER_TASK is mutually exclusive with --ntasks-per-gpu and SLURM_NTASKS_PER_GPU #373

@hudja

Description

@hudja

Hello, unfortunately, the problem persists. SM 9.13.6, plugin version 1.9.2. Test code

rule tst:
    output:
        'done'
    resources:
        mem_mb = 1*1024,
        slurm_partition='gpu',
        gres="gpu:tesla:1"
    shell:
        r"""
        export OMP_NUM_THREADS=1

        module load singularity
        module load cudnn/8.9.7.29-12 cuda/12.1.0
        
        singularity exec --nv deepvariant_1.9.0-gpu.sif \
        call_variants --helpshort

        touch {output}
        """

error:
srun: fatal: SLURM_TRES_PER_TASK is mutually exclusive with --ntasks-per-gpu and SLURM_NTASKS_PER_GPU

I was able to fix it with the following changes:

    if gpu_job:
        # fixes #316 - allow unsetting of tasks per gpu
        # apparently, python's internal process manangement interfers with SLURM
        # e.g. for pytorch
        # ntasks_per_gpu = job.resources.get("tasks_per_gpu")
        # if ntasks_per_gpu is None:
        #     ntasks_per_gpu = job.resources.get("tasks")
        # if ntasks_per_gpu is None:
        #     ntasks_per_gpu = 1

        # if ntasks_per_gpu >= 1:
        #     call += f" --ntasks-per-gpu={ntasks_per_gpu}"
        call += f" --gres={job.resources.get("gres", "")}"
...
    # we need to set cpus-per-task OR cpus-per-gpu, the function
    # will return a string with the corresponding value
    # call += f" {get_cpu_setting(job, gpu_job)}"
    # if job.resources.get("slurm_extra"):
    #     call += f" {job.resources.slurm_extra}"

but this option fails when running CPU task.

Originally posted by @hudja in #342

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions