This R package provides tools for generating consensus Topologically Associating Domains (TADs) from multiple prediction methods. TADs are fundamental units of chromatin organization that play crucial roles in gene regulation. While multiple computational tools exist to predict TAD boundaries from Hi-C data, their results often vary significantly. This package implements methods to integrate predictions from multiple tools and generate high-confidence consensus TAD sets.
# Install from GitHub
devtools::install_github("CSOgroup/consensusTADs", build_vignettes = TRUE)- Generate consensus TADs from multiple prediction tools
- Calculate Measure of Concordance (MoC) between TAD predictions
- Select optimal non-overlapping TAD sets using dynamic programming
- Apply iterative threshold approach for consensus building
Creates consensus TADs through an iterative threshold approach that selects optimal non-overlapping TADs representing agreement across different prediction methods.
consensus_tads <- generate_tad_consensus(
df_tools, # Data frame with TAD predictions
threshold = 0, # Minimum MoC threshold
step = -0.05 # Step size for threshold iteration
)Generates hierarchical consensus TADs through multiple rounds of iteration. In each round, it identifies consensus TADs and removes partially overlapping regions from the input data for the next round.
hierarchical_tads <- generate_tad_consensus_hierarchy(
df_tools, # Data frame with TAD predictions
threshold = 0, # Minimum MoC threshold
step = -0.05, # Step size for threshold iteration
max_round = NULL, # Maximum number of rounds
consider_level = TRUE
)Calculates the Measure of Concordance (MoC) between TAD predictions and filters significant overlaps based on a threshold.
Implements a dynamic programming algorithm to select a set of non-overlapping TADs that maximize the total MoC score.
# Prepare input data with predictions from multiple tools
tad_data <- data.frame(
chr = rep("chr1", 6),
start = c(10000, 20000, 50000, 12000, 22000, 48000),
end = c(30000, 45000, 65000, 32000, 43000, 67000),
meta.tool = c(rep("tool1", 3), rep("tool2", 3))
)
# Generate consensus TADs with default parameters
library(consensusTADs)
consensus_results <- generate_tad_consensus(tad_data)
print(consensus_results)
# Generate consensus TADs with custom threshold values
custom_consensus <- generate_tad_consensus(
tad_data,
threshold = 0.3,
step = -0.1
)
# Enable parallel processing for large datasets
options(future.globals.maxSize = 10 * 1024^3)
future::plan(future::multisession(workers = 4))
# Work with tool levels
tad_data_with_level <- data.frame(
chr = rep("chr1", 8),
start = c(10000, 15000, 20000, 50000, 55000, 15000, 50000, 80000),
end = c(30000, 35000, 45000, 70000, 75000, 35000, 70000, 100000),
meta.tool = c("tool1", "tool1", "tool2", "tool3", "tool3", "tool2", "tool1", "tool4"),
meta.tool_level = c("L1", "L2", NA, "L1", "L2", NA, "L2", NA)
)
result_hierarchy <- generate_tad_consensus_hierarchy(
tad_data_with_level,
max_round = NULL,
consider_level = TRUE
)
# Work without tool levels
tad_data_with_level <- data.frame(
chr = rep("chr1", 8),
start = c(10000, 15000, 20000, 50000, 55000, 15000, 50000, 80000),
end = c(30000, 35000, 45000, 70000, 75000, 35000, 70000, 100000),
meta.tool = c("tool1", "tool1", "tool2", "tool3", "tool3", "tool2", "tool1", "tool4")
)
result_hierarchy <- generate_tad_consensus_hierarchy(
tad_data_with_level,
max_round = NULL
)
future::plan(future::sequential)
The consensus generation process follows these steps:
- Input validation: Check if the input contains data from multiple prediction tools
- Data preparation: Split the input data by chromosome
- Threshold sequence generation: Create a sequence of threshold values
- Iterative TAD selection: For each chromosome and threshold, calculate MoC scores and select optimal TADs
- Result compilation: Combine results from all chromosomes
The MoC score quantifies the agreement between two TAD predictions:
MoC = (intersection_width)² / (width1 × width2)
Where:
intersection_widthis the length of the overlap between two TADswidth1andwidth2are the lengths of the two TADs being compared
- dplyr
- GenomicRanges
- IRanges
- tibble
- purrr
- tidyr
- stringr
- magrittr