Publishing large file (1.9TB) after process takes very long (6 days) with publish mode copy. #2866
Replies: 2 comments
-
UpdateI did some more digging into the source code and it seems the earlier mentioned publishDir "output", mode: "copy"Since I don't know how to test / improve copying speed in this case, I decided to create a workaround myself. workaround solutionFirst, let me show the process wrapdemux {
label "process_medium"
publishDir "${params.tosenddir}", pattern: "*.{tar,txt}", saveAs: {filename -> params.splitprojectsarchive && params.hassampleproject? "$project_name/$filename": filename}, mode: "copy"
publishDir "${params.logdir}/${task.process}/${task.hash}", pattern: ".*", mode: "copy"
input:
tuple val(project_name), val(samples), path(fastq_files)
output:
tuple val(project_name), val(samples), path("*.{tar,txt}")
path(".*")
script:
"""
# code to generate .tar and .txt files
"""
}I decided to mimic the process {
withName: wrapdemux {
afterScript = {
if(params.hassampleproject && params.splitprojectsarchive) {
copylocation = params.tosenddir + "/" +project_name
} else {
copylocation = params.tosenddir
}
"""
mkdir -p ${copylocation}
find . -name "*.txt" -exec cp -fRL '{}' ${copylocation} \\;
find . -name "*.tar" -exec cp -fRL '{}' ${copylocation} \\;
"""
}
}
}This We don't want to completely remove the To solve this, we make use of how nextflow prioritizes Since we publish the process {
withName: wrapdemux {
afterScript = {
if(params.hassampleproject && params.splitprojectsarchive) {
copylocation = params.tosenddir + "/" +project_name
} else {
copylocation = params.tosenddir
}
"""
mkdir -p ${copylocation}
find . -name "*.txt" -exec cp -fRL '{}' ${copylocation} \\;
find . -name "*.tar" -exec cp -fRL '{}' ${copylocation} \\;
"""
}
publishDir = [
path: {"${params.logdir}/${task.process}/${task.hash}"},
pattern: ".*",
mode: "copy"
]
}
}Now, I added the above code in a separate config file named profiles {
local_server {
includeConfig "conf/send-tar.config"
}
}I hope this can be helpful if someone runs into a similar problem. |
Beta Was this translation helpful? Give feedback.
-
|
@bentsherman This was the issue I referenced during our conversation earlier today. After the hackathon I will do some tests to see if the issue still persists in the latest Nextflow Version or if it got fixed throughout the years |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Nextflow information
nextflow version 21.10.6.5660
Executor: Local.
Run with docker.
System used:
Problem
We have a process in our pipeline that creates a tar archive. Nextflow takes a very long time (6 days) to publish large tar file (1.9TB) when using the
publishDirdirective with publish mode:copy.The nextflow command terminated with the following warning message:
At this point, we didn’t get our command line prompt back and the file transfer was still ongoing.
Questions
Which command and parameters are used by Nextflow to copy the files when specifying
copyoption inpublishDirdirective?From what we saw of the source code and the
.command.runfile we assume it’s just acp -fRLcommand on the host system.When manually copying the same source file to the same destination, the copying is done in 3 hours, when it took nextflow several days. What would be the reason for this?
Extra information
Cp from inside NextFlow (cp -fRL ?) LocalSource (work directory) remote share (7200RPM HDD)
Bandwith Out +/- 45 Mb/s
Cp -fRL LocalSource (work directory) RemoteShare (7200RPM HDD)
Bandwith Out +/- 1.5Gb/s
If you have any questions or need more information from the log files, please ask.
Beta Was this translation helpful? Give feedback.
All reactions