-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Especially in doco counting, maybe the aggregate function can be replaced with something else.
longhaul/R/blessy-createDoCoCount.R
Line 89 in 357a211
| aggregated_df <- aggregate( |
Use data.table package will be faster, but will add dependency.
Here is some test on a count table with 174k transcripts and 44k DoCo.
dim(merged_df)
# [1] 174278 3853
length(unique(merged_df$DoCo))
# [1] 44099
# use merge
system.time(aggregated_df <- aggregate(
merged_df[, sample_columns],
by = list(DoCo = merged_df$DoCo),
FUN = sum
))
# user system elapsed
# 938.484 21.817 950.460
# use dplyr
system.time(aggregated_df2 <- merged_df %>%
group_by(DoCo) %>%
summarise(across(all_of(sample_columns), sum)))
# user system elapsed
# 457.294 25.740 482.254
# use data.table
system.time(dt <- as.data.table(merged_df))
# user system elapsed
# 1.238 0.028 1.234
system.time(aggregated_dt <- dt[, lapply(.SD, sum), by = DoCo, .SDcols = sample_columns])
# user system elapsed
# 17.612 0.084 2.258
system.time(aggregated_df3 <- data.frame(aggregated_dt))
# user system elapsed
# 0.471 0.000 0.472
Metadata
Metadata
Assignees
Labels
No labels