[UPDATE] Adaptation of trigger.sh for SLURM and SGE #302

rolivella · 2025-02-11T14:45:01Z

🚀 [UPDATE] Adaptation of `trigger.sh` for SLURM and SGE

📌 Summary

This issue documents the adaptation of trigger.sh to be compatible with both SLURM and SGE, maintaining a modular structure and using submit_slurm.sh for job submission in SLURM.

🔄 Main Changes

Automatic detection of SLURM or SGE in trigger.sh.
Modification of launch_nf_run so that:
- In SLURM, it submits sbatch submit_slurm.sh.
- In SGE, it directly executes nextflow run.
No redundant configurations inside trigger.sh; everything is managed through the .config files.

📂 Modified Files

trigger.sh
submit_slurm.sh (renamed from submit_nf.sh for clarity)

🛠️ Steps to Apply the Changes

1️⃣ Modify `trigger.sh`

🔹 Add automatic SLURM/SGE detection

Add this block at the beginning of trigger.sh:

## 🔍 DETECT WHETHER WE ARE IN SLURM OR SGE
if command -v sinfo &> /dev/null; then
    SYSTEM="SLURM"
elif command -v qstat &> /dev/null; then
    SYSTEM="SGE"
else
    echo "❌ [ERROR] Neither SLURM nor SGE detected. Exiting..."
    exit 1
fi

echo "✅ [INFO] Detected system: $SYSTEM"

🔹 Replace the `launch_nf_run` function

Replace the existing launch_nf_run function with this improved version:

## 🚀 EXECUTE NEXTFLOW IN SLURM OR SGE
launch_nf_run () {
    local workflow_script=$1
    local log_file=$2
    local params="${@:3}" # Remaining parameters

    if [[ $SYSTEM == "SLURM" ]]; then
        echo "🚀 [INFO] Launching Nextflow with SLURM via submit_slurm.sh..."
        sbatch submit_slurm.sh "$workflow_script" "$CONFIG_FILE" "$LAB" "$params" "$log_file"

    elif [[ $SYSTEM == "SGE" ]]; then
        echo "🚀 [INFO] Launching Nextflow directly in SGE..."
        nextflow run "$workflow_script" -c "$CONFIG_FILE" -profile "$LAB" $params > "$log_file" 2>&1 &
    fi
}

2️⃣ Update `submit_slurm.sh` (previously `submit_nf.sh`)

Ensure submit_slurm.sh is properly structured to receive arguments and execute Nextflow correctly:

#!/bin/bash
# Arguments received from sbatch
WORKFLOW_SCRIPT=$1
CONFIG_FILE=$2
PROFILE=$3
PARAMS=$4
LOG_FILE=$5

# Load modules
module load Nextflow/23.10.1

# Run Nextflow
nextflow run "$WORKFLOW_SCRIPT" -c "$CONFIG_FILE" -profile "$PROFILE" $PARAMS > "$LOG_FILE" 2>&1

✅ How to Test the Changes

Run trigger.sh in a SLURM environment and verify that sbatch is used correctly:
```
./trigger.sh LAB_NAME prod /path/to/assets data_file
```
Run it in an SGE environment and check that nextflow run is executed directly:
```
./trigger.sh LAB_NAME prod /path/to/assets data_file
```
Check logs to ensure Nextflow is running correctly.

📌 Conclusion

This change simplifies pipeline management by using a single trigger.sh for both SLURM and SGE, maintaining modularity and leveraging submit_slurm.sh for job submission in SLURM. 🚀🔥

The text was updated successfully, but these errors were encountered:

rolivella · 2025-02-14T09:05:13Z

Hi @braffes feel free to share you slurm config. Thank you!

braffes · 2025-02-14T10:38:43Z

Hi, this is just a first attempt, and I haven't completely tested or optimized it for every type of job. However, it should still work.

I think "clusterOptions" is not mandatory since the default QOS should be normal.

Please note that I'm currently updating atlas/qsample, so the defaultab.config file might not be the latest version. Feel free to merge if any information is missing.

I will try to come back with a more accurate/optimize configuration when I can.

defaultlab.config.txt

rolivella · 2025-02-14T11:50:06Z

Thanks @braffes I'll test it next week and let you know

rolivella · 2025-03-03T14:28:32Z

🚀 [UPDATE] Unified Nextflow Configuration for SLURM and SGE

✅ Subtasks

Integrate SLURM and SGE configurations into a single nextflow.config
Define profiles for slurm and sge
Ensure shared parameters remain in the common section
Implement automatic selection based on NXF_EXECUTOR environment variable
Test execution using -profile slurm and -profile sge
Verify job submission on both SLURM and SGE clusters
Validate log outputs for correctness

📌 Summary

This issue describes the integration of both SLURM and SGE execution environments into a single nextflow.config file. The goal is to maintain flexibility, allowing users to switch between execution modes using Nextflow profiles or an environment variable.

🔄 Main Changes

Added profiles section to separate slurm and sge configurations.
Moved shared parameters (folders, API settings, default values) to the global section.
Implemented NXF_EXECUTOR environment variable for automatic execution mode detection.
Ensured compatibility with both SLURM and SGE clusters.

📂 Modified Files

nextflow.config (Unified SLURM & SGE configuration)

🛠️ Steps to Apply the Changes

1️⃣ Modify `nextflow.config` to support SLURM and SGE

params {
    executor = System.getenv('NXF_EXECUTOR') ?: 'slurm' // Auto-select based on env variable

    // Folders:
    home_dir                    = "/home/proteomics"
    databases_folder            = "${params.home_dir}/mygit/atlas-databases"
    contaminants_file           = "${params.home_dir}/mygit/atlas-config/atlas-test/assets/contaminants.fasta"
    contaminants_prefix         = "CON_"
    scripts_folder              = "$baseDir/bin"
    tools_folder                = "$baseDir/tools"

    // API:
    url_api_signin              = "10.102.1.54/api/auth/signin"
    url_api_insert_file         = "10.102.1.54/api/file/insertFromPipelineRequest"
    url_api_insert_wetlab_data  = "10.102.1.54/api/data/pipeline"
    url_api_insert_data         = "10.102.1.54/api/data/pipelineRequest"
    url_api_insert_quant        = "10.102.1.54/api/quantification/pipeline"
    url_api_fileinfo            = "10.102.1.54/api/fileInfo/pipeline"
    url_api_insert_modif        = "10.102.1.54/api/modification/pipeline"
    url_api_insert_wetlab_file  = "10.102.1.54/api/file/insertFromPipeline"

    // API Key:
    api_key_qc_params           = "453a2301-6698-43dd-baae-7eb4c6a5eaa5"

    // Default search engine:
    search_engine               = "comet"
}

profiles {
    slurm {
        process {
            executor = "slurm"
            queue = "genoa64"
            cpus = '1'
            cache = 'lenient'

            clusterOptions = { task.time <= 3.h ? '--qos=shorter' :
                               (task.time <= 6.h ? '--qos=short' :
                               (task.time <= 12.h ? '--qos=normal' :
                               (task.time <= 24.h ? '--qos=long' :
                               (task.time <= 48.h ? '--qos=vlong' : '--qos=marathon')))) }

            withLabel:big_cpus {
                cpus = 8
                time = '6h'
                memory = '20G'
            }

            withLabel:big_mem {
                time = '12h'
                memory = '60G'
            }
        }
    }

    sge {
        process {
            executor = "sge"
            queue = "all.q"
            cpus = '1'
            cache = 'lenient'

            clusterOptions = '-l h_rt=48:00:00 -pe smp 4 -l h_vmem=16G'

            withLabel:big_cpus {
                cpus = 8
                time = '6h'
                memory = '20G'
                clusterOptions = '-l h_rt=6:00:00 -pe smp 8 -l h_vmem=20G'
            }

            withLabel:big_mem {
                time = '12h'
                memory = '60G'
                clusterOptions = '-l h_rt=12:00:00 -l h_vmem=60G'
            }
        }
    }
}

singularity {
    enabled = true
    cacheDir = "${params.home_dir}/mygit/atlas-imgs"
    runOptions = "-B ${params.home_dir}:${params.home_dir}"
}

✅ How to Test the Changes

Run Nextflow with SLURM profile:
```
nextflow run main.nf -profile slurm
```
Run Nextflow with SGE profile:
```
nextflow run main.nf -profile sge
```
Auto-detect executor using environment variable:
```
export NXF_EXECUTOR=slurm
nextflow run main.nf
```

Verify jobs are submitted correctly to SLURM or SGE:

squeue -u $USER  # SLURM
qstat -u $USER    # SGE

Check logs to ensure proper execution.

📌 Conclusion

This update provides a unified Nextflow configuration supporting both SLURM and SGE using profiles. It allows seamless switching between execution environments and enables automatic selection via environment variables. 🚀🔥

rolivella · 2025-03-03T15:41:59Z

All changes are done, tomorrow I'll test it within the branch atlas-test:

First test SGE profile. Maybe I'll need to modify the -profile section adding sge.
Then test SLURM profile.

rolivella · 2025-03-05T10:38:22Z

After testing I think that the config files are becoming too much invloved and should be rethinked for the sake of clarity and scalability. This could be the new structure:

nextflow_configs/
│── shared.config               # Shared parameters (images, tools, API, etc.)
│── slurm/
│   ├── slurm.config            # SLURM-specific configuration
│   ├── profiles/
│   │   ├── tiny.config         # SLURM tiny profile
│   │   ├── small.config        # SLURM small profile
│   │   ├── medium.config       # SLURM medium profile
│   │   ├── big.config          # SLURM big profile
│── sge/
│   ├── sge.config              # SGE-specific configuration
│   ├── profiles/
│   │   ├── tiny.config         # SGE tiny profile
│   │   ├── small.config        # SGE small profile
│   │   ├── medium.config       # SGE medium profile
│   │   ├── big.config          # SGE big profile
│── aws/
│   ├── aws.config              # AWS Batch-specific configuration
│   ├── profiles/
│   │   ├── tiny.config         # AWS tiny profile
│   │   ├── small.config        # AWS small profile
│   │   ├── medium.config       # AWS medium profile
│   │   ├── big.config          # AWS big profile
│── pbs/
│   ├── pbs.config              # PBS/Torque-specific configuration
│   ├── profiles/
│   │   ├── tiny.config         # PBS tiny profile
│   │   ├── small.config        # PBS small profile
│   │   ├── medium.config       # PBS medium profile
│   │   ├── big.config          # PBS big profile
│── nextflow.config             # Main entrypoint that auto-selects executor

The nextflow.config should be modified accordingly:

// Auto-detect executor from environment
def executor = System.getenv('NXF_EXECUTOR') ?: 'slurm'

// Include shared settings
includeConfig 'nextflow_configs/shared.config'

// Load the correct executor configuration
if (executor == 'slurm') {
    includeConfig 'nextflow_configs/slurm/slurm.config'
} else if (executor == 'sge') {
    includeConfig 'nextflow_configs/sge/sge.config'
} else if (executor == 'aws') {
    includeConfig 'nextflow_configs/aws/aws.config'
} else if (executor == 'pbs') {
    includeConfig 'nextflow_configs/pbs/pbs.config'
} else {
    throw new RuntimeException("Unknown executor: " + executor)
}

And rethink it's old content:

params.custom_config_base = "/home/proteomics/mygit/atlas-config/atlas-test/."
includeConfig("atlas_custom.config")

Also rethink this -profile $LAB,"${13}" at trigger.sh

rolivella · 2025-03-11T14:51:38Z

Done until today:

Now atlas-test branch works with SGE cluster with brand new config structure. ✅
Working DDA workflows with SLURM. ✅

Pending:

Extend trigger_slurm.sh call for all methods parameters. ✅
Unit tests:
- DDA SLURM. ✅
- DIA SLURM. ✅
- Comet SLURM. ✅
- DDA SGE. ✅
- DIA SGE. ✅
- Comet SGE. ✅
Slack notification system also for SGE. ✅
Integration test. ✅
Insert test SLURM and SGE. ✅
Issues to solve before moving:
- Mem. def. at sbatch script and config file. If I remove sbatch mem from the script, I got: slurmstepd: error: Detected 1 oom_kill event in StepId=8601174.batch. Some of the step tasks have been OOM Killed. Solved: one think is the mem requested for the generic nextflow process, and the other think is the particular resource for a particular process. So I leave #SBATCH --mem=1G at trigger_slurm.sh. ✅
Feedback users about trigger_slurm.sh.
Move to prod CRG and check.
Update wiki.
Publication of new atlas release.

rolivella · 2025-03-17T10:22:09Z

Feedback on SLURM Adaptation in the Pipeline

Hi @temaia @braffes

I would like to get your feedback regarding the SLURM adaptation I am working on. In our computing center, we are required to first launch a process using nextflow run, which I currently allocate 1G of memory to. From this initial process, the rest of the pipeline’s processes are submitted with the resources configured in the relevant .config files.

My question:

Does your cluster work in the same way, or does it follow a different setup? I want to understand if you also need a script to launch the Nextflow process in your environment.

The best solution for me would be if it works as I have implemented it now since that way I don’t have to handle specific cases. However, I am open to your feedback and would appreciate your input.

I have added a new script for this purpose, which you can find in the test version of ATLAS:
🔗 trigger_slurm.sh

Looking forward to your feedback!

Thanks,
Roger

braffes · 2025-03-18T11:15:51Z

Hi @rolivella ,

In my current qsample setup, the nextflow command is started inside my VM, not in a slurm job. I don't think I am interested to use a job to handle the nextflow process, from my point on view, it will just waste 1 core...

After saying that, I can do some test about your new version if needed.

Best,
Brice

rolivella · 2025-03-18T13:23:03Z

Ok @braffes thanks for your feedback, it make sense to put an option to skip this nextflow process. I'll do it, but you'll have to test it because it's not allowed on my institutional cluster.

rolivella · 2025-03-28T11:41:31Z

Hi @braffes,

I'm going to refocus on this issue to have it wrapped up by next week.

Actually, as far as I know about SLURM I believe you do need a job script to launch the Nextflow process. It’s not like SGE in that sense. So, the way I’ve implemented it now (having Nextflow started via a SLURM job) should be fine.

What’s your opinion on that?

braffes · 2025-03-28T15:57:26Z

Hi @rolivella,

I am not sure I understand your point. There are two possibilities:

Use a SLURM job (with a SBATCH script) to manage the Nextflow main process on a VM/submit node, which will then run new jobs via the "SLURM executor" (for some clusters, it is mandatory to do this to avoid monopolizing the submit node).
Start the Nextflow main process directly on the VM/submit node, which will then run new jobs via the "SLURM executor" (for some clusters, this is accepted).

I currently use the second option since I don't need a SLURM job to run the Nextflow main process and I don't want to waste 1 core on my HPC cluster for this job.

I have never played with SGE, but after reading a bit of the documentation, it seems that the commands sbatch/qsub are doing the same kind of things, right? For both options, you have the choice to create a job for the main Nextflow process or not. I think it can be a good idea to have both possibilities.

rolivella · 2025-03-31T10:44:08Z

Hi @braffes

Yes, you're right — the issue I have is that I cannot test the second option because my cluster only works as in option 1.

The only solution I can think of is to make a change in the pipeline so that it supports option 2 as well, and then you could try it out once it's ready. I would push it to the atlas-test branch on GitHub.

Would that work for you?

@temaia do you know which mode use your slurm cluster? Or can I ask to someone in order to clarify this?

braffes · 2025-03-31T11:08:26Z

Hi @rolivella

The option 2 is the default one on your implementation since I only modify the nextflow file to use slurm executor and I don't change trigger.sh.

That's ok to test the new implementation.

rolivella · 2025-03-31T11:58:22Z

@braffes ah, now I see. In this way I think that I finished the implemntation appart from some small loose-ends.

rolivella self-assigned this Feb 11, 2025

rolivella added the High priority label Feb 11, 2025

rolivella assigned braffes Feb 14, 2025

rolivella changed the title ~~To slurm queue~~ [UPDATE] Adaptation of trigger.sh for SLURM and SGE Mar 3, 2025

rolivella assigned temaia Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UPDATE] Adaptation of trigger.sh for SLURM and SGE #302

[UPDATE] Adaptation of trigger.sh for SLURM and SGE #302

rolivella commented Feb 11, 2025 •

edited

Loading

rolivella commented Feb 14, 2025

braffes commented Feb 14, 2025

rolivella commented Feb 14, 2025

rolivella commented Mar 3, 2025 •

edited

Loading

rolivella commented Mar 3, 2025

rolivella commented Mar 5, 2025 •

edited

Loading

rolivella commented Mar 11, 2025 •

edited

Loading

rolivella commented Mar 17, 2025

braffes commented Mar 18, 2025

rolivella commented Mar 18, 2025

rolivella commented Mar 28, 2025

braffes commented Mar 28, 2025 •

edited

Loading

rolivella commented Mar 31, 2025

braffes commented Mar 31, 2025

rolivella commented Mar 31, 2025

[UPDATE] Adaptation of trigger.sh for SLURM and SGE #302

[UPDATE] Adaptation of trigger.sh for SLURM and SGE #302

Comments

rolivella commented Feb 11, 2025 • edited Loading

🚀 [UPDATE] Adaptation of trigger.sh for SLURM and SGE

📌 Summary

🔄 Main Changes

📂 Modified Files

🛠️ Steps to Apply the Changes

1️⃣ Modify trigger.sh

🔹 Add automatic SLURM/SGE detection

🔹 Replace the launch_nf_run function

2️⃣ Update submit_slurm.sh (previously submit_nf.sh)

✅ How to Test the Changes

📌 Conclusion

rolivella commented Feb 14, 2025

braffes commented Feb 14, 2025

rolivella commented Feb 14, 2025

rolivella commented Mar 3, 2025 • edited Loading

🚀 [UPDATE] Unified Nextflow Configuration for SLURM and SGE

✅ Subtasks

📌 Summary

🔄 Main Changes

📂 Modified Files

🛠️ Steps to Apply the Changes

1️⃣ Modify nextflow.config to support SLURM and SGE

✅ How to Test the Changes

📌 Conclusion

rolivella commented Mar 3, 2025

rolivella commented Mar 5, 2025 • edited Loading

rolivella commented Mar 11, 2025 • edited Loading

rolivella commented Mar 17, 2025

Feedback on SLURM Adaptation in the Pipeline

My question:

braffes commented Mar 18, 2025

rolivella commented Mar 18, 2025

rolivella commented Mar 28, 2025

braffes commented Mar 28, 2025 • edited Loading

rolivella commented Mar 31, 2025

braffes commented Mar 31, 2025

rolivella commented Mar 31, 2025

rolivella commented Feb 11, 2025 •

edited

Loading

🚀 [UPDATE] Adaptation of `trigger.sh` for SLURM and SGE

1️⃣ Modify `trigger.sh`

🔹 Replace the `launch_nf_run` function

2️⃣ Update `submit_slurm.sh` (previously `submit_nf.sh`)

rolivella commented Mar 3, 2025 •

edited

Loading

1️⃣ Modify `nextflow.config` to support SLURM and SGE

rolivella commented Mar 5, 2025 •

edited

Loading

rolivella commented Mar 11, 2025 •

edited

Loading

braffes commented Mar 28, 2025 •

edited

Loading