Include metadata in the output of file processing operators #5744
Replies: 1 comment 1 reply
-
Operators like The Channel.fromPath('*.csv')
.flatMap { csv -> csv.splitCsv() } So if you wanted to preserve things like the source file and number of lines, you can do it all in a Channel.fromPath('*.csv')
.flatMap { csv ->
def records = csv.splitCsv()
records.collect { record ->
tuple( csv, records.size(), record )
}
} There is a similar analogy for |
Beta Was this translation helpful? Give feedback.
-
New feature
Hi, thank you for developing Nextflow!
I am wondering, I build lots of my workflow steps using Nextflow builtin operators. One thing that could be helpful is a feature to keep track of which file was input to
splitCsv
orcountLines
operator.Usage scenario
More precisely, by metadata I mostly mean the source file. This would allow to compute something on each file of a channel, but keep the provenance, so as to easily merge back this information.
For example, let's say we need to know the number of lines of each file (for use as size in
groupKey
for example):Each item of the desired output channel should be like
[ source_file_path, number_of_lines ]
.For
splitCsv
outputs, having the source file automatically added means we can process lines by source file, and/or usecollectFile
to gather back lines into files corresponding to their original source.Suggested implementation
The most basic idea would be to add a specific boolean option, such as
withSourceFile
:and similarly for e.g.
splitCsv
,splitText
and others.Alternatively, one rather transparent but more flexible implementation would be to allow all tuple elements to be passed along, instead of discarding them, so that we can do:
Current workaround
Currently, to count lines and keep the source file name, I have to either:
read the file with groovy code inside a
map
operator:define a process doing the same.
For
split*
operators such assplitCsv
, the best seems to first use a process to insert the filename into the file itself, for example:I hope that's a reasonable suggestion... I am happy to explain more the use cases.
Beta Was this translation helpful? Give feedback.
All reactions