You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update version to 2.1.1 and enhance scale_abundance function with chunk processing
- Bumped package version to 2.1.1.
- Added chunk_sample_size parameter to scale_abundance for memory-efficient processing of large datasets.
- Improved documentation to include new parameter details.
- Refactored scaling logic to support chunked processing with BiocParallel for better performance.
#' @param reference_sample A character string. The name of the reference sample. If NULL the sample with highest total read count will be selected as reference.
18
18
#' @param .subset_for_scaling A gene-wise quosure condition. This will be used to filter rows (features/genes) of the dataset. For example
19
19
#' @param suffix A character string to append to the scaled abundance column name. Default is "_scaled".
20
+
#' @param chunk_sample_size An integer indicating how many samples to process per chunk. Default is `Inf` (no chunking). For HDF5-backed data or large datasets, set to a finite value (e.g., 50) to enable memory-efficient chunked processing with BiocParallel parallelization.
20
21
#'
21
22
#' @param reference_selection_function DEPRECATED. please use reference_sample.
if(is.null(reference_sample)) message(sprintf("tidybulk says: the sample with largest library size %s was chosen as reference for scaling", reference))
187
196
188
-
# Calculate TMM
189
-
nf<-
190
-
edgeR::calcNormFactors(
191
-
my_counts_filtered,
192
-
refColumn=reference,
193
-
method=method
194
-
)
195
-
196
-
# Calculate multiplier
197
-
multiplier=
198
-
# Relecting the ratio of effective library size of the reference sample to the effective library size of each sample
# Check if BiocParallel is available, otherwise install
230
+
check_and_install_packages("BiocParallel")
231
+
232
+
# Get the current BiocParallel backend
233
+
bp_param<-BiocParallel::bpparam()
234
+
235
+
# Inform user about parallelization settings
236
+
if (is(bp_param, "SerialParam")) {
237
+
message("tidybulk says: Processing chunks serially. For parallel processing, configure BiocParallel with BiocParallel::register() before calling this function. For example: BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE))")
238
+
} else {
239
+
message(sprintf("tidybulk says: Processing %d chunks in parallel using %s with %d workers",
# Check if BiocParallel is available, otherwise install
284
-
check_and_install_packages("BiocParallel")
285
-
286
-
# Get the current BiocParallel backend
287
-
bp_param<-BiocParallel::bpparam()
288
-
289
-
# Inform user about parallelization settings
290
-
if (is(bp_param, "SerialParam")) {
291
-
message("tidybulk says: Processing chunks serially. For parallel processing, configure BiocParallel with BiocParallel::register() before calling this function. For example: BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE))")
292
-
} else {
293
-
message(sprintf("tidybulk says: Processing %d chunks in parallel using %s with %d workers",
0 commit comments