miicsearchscore: An Efficient Search-and-Score Algorithm for Ancestral Graphs using Multivariate Information Scores
This repository provides an R implementation of the MIIC_search&score algorithm, introduced in our ICML 2025 paper, for causal discovery in the presence of latent variables. The algorithm combines a theoretical likelihood decomposition for ancestral graphs with a practical, efficient two-step search-and-score procedure based on multivariate information scores.
# Install devtools if needed
install.packages("devtools")
# Install the package directly
devtools::install_github("miicTeam/miicsearchscore")
# Load the package
library(miicsearchscore)
# Load the example dataset
data(nonlinear_data)
# Run the method on the example data
adj <- run_miic_searchscore(nonlinear_data, n_threads = 1)The method improves upon MIIC through a greedy scoring scheme based on higher-order ac-connected information subsets. It is especially suited for:
- Ancestral graphs including latent confounders (bidirected edges),
- Complex datasets, such as continuous including non-linear couplings between variables, or categorical data,
- Scalable inference, thanks to localized scoring limited to collider paths of up to two edges.
You have two ways to install and use the miicsearchscore package:
To access everything (including the benchmark and simulation scripts):
# Clone the full repository
git clone https://github.com/miicTeam/miicsearchscore.git
cd miicsearchscoreThen, open R from this folder and run:
# Install devtools if needed
install.packages("devtools")
# Install from local source
devtools::install(".")
library(miicsearchscore)If you only need the core R functions (no benchmark), use:
# Install devtools if needed
install.packages("devtools")
# Install directly from GitHub
devtools::install_github("miicTeam/miicsearchscore")
# Load the package
library(miicsearchscore)
β οΈ This method installs only the R package β not thebenchmark/folder.
The algorithm proceeds in two steps:
miic_result <- miic(data,
latent = "orientation",
propagation = TRUE,
consistent = "orientation",
n_threads = n_threads)
summary <- miic_result$summary
summary <- summary[summary$type == "P", ]
hash_table <- new.env()
adj_miic <- miic_result$adj_matrixstep1_result <- apply_node_score_step_1(adj_miic, data, hash_table)
adj_step1_node_score <- step1_result$adj
hash_table <- step1_result$hash_tablestep2_result <- apply_edge_score_step_2(adj_step1_node_score, data, hash_table)
adj_step2_edge_score <- step2_result$adjOr run everything in one call:
adj <- run_miic_searchscore(data, n_threads = 1)miicsearchscore/
βββ R/ # Core R source files implementing the MIIC_search&score algorithm
βββ data/ # Package dataset (.rda), accessible via data()
βββ man/
βββ benchmark/ # Benchmarking scripts
β βββ MIIC_search_and_score/ # Scripts to run benchmarks for MIIC_search&score
β β βββ categorical/ # Scripts for categorical data settings
β β β βββ bootstrap/
β β β βββ normal/
β β βββ continuous/
β β β βββ linear_gaussian/
β β β βββ non_linear/
β βββ baselines/ # Scripts to run and evaluate baseline methods
β β βββ DAGGNN/
β β β βββ linear_gaussian/
β β β βββ non_linear/
β β βββ FCI/
β β β βββ bootstrap/
β β β βββ normal/
β β βββ GFCI/
β β βββ M3HC/
β βββ data/
β β βββ CPT/ # Conditional probability tables (used for categorical models)
β βββ simulations/ # Data and graph generation scripts
β β βββ categorical/
β β βββ continuous/
β βββ utils/ # Shared utility scripts: plotting, metrics, graph conversion, etc.
Benchmarks for reproducing Figures 2, 3, E.2, E.3, and Table E.1 of the paper are provided in the benchmark/ folder. Before running them, make sure you are in thebenchmark/ directory.
Before running any simulation, make sure to install the required R packages (with exact versions) using:
Rscript install_requirements.RThis will install all packages listed in requirements.txt, including those from CRAN and Bioconductor.
You can run both simulations (continuous and categorical) at once with:
Rscript simulations/run_all_graph.ROr run them separately:
Rscript simulations/continuous/simulate_dag_cpdag_pag_continuous.R
Rscript simulations/categorical/simulate_dag_cpdag_pag_categorical.Rπ Output directory
All output datasets are saved automatically in the simulated_data/graphs/ directory.
You can launch all benchmark simulations at once using the main launcher:
Rscript MIIC_search_and_score/run_all.RThis will execute all benchmark pipelines across continuous and categorical scenarios.
βοΈ Alternatively, run each simulation type separately:
Rscript MIIC_search_and_score/continuous/linear_gaussian/run_all_linear_gaussian.RRscript MIIC_search_and_score/continuous/non_linear/run_all_non_linear.RRscript MIIC_search_and_score/categorical/normal/run_all_categorical.RRscript MIIC_search_and_score/categorical/bootstrap/run_all_categorical_bootstrap.Rπ Output directory
All output graphs are saved automatically in the results/ directory.
π§ Tips
- If you want to run only a subset of benchmarks, you can edit the
run_all.Rfile and comment out specific simulation blocks. - In each subdirectory of
MIIC_search_and_score(e.g.,categorical/bootstrap), you will find additional scripts that generate job submission files for HPC environments using PBS or SLURM. These scripts typically start withgenerate_andlauch_and are intended to help launch large-scale benchmark runs efficiently on a cluster.
To run other benchmark algorithms (developed in different languages such as Python or MATLAB), you'll need to generate the corresponding datasets from the simulated graphs.
To generate all types of data at once (continuous, non-linear, categorical, etc.), use:
Rscript simulations/run_all_data.RYou can also launch the data generation scripts individually within each subdirectory, similarly to the graph generation step. For example:
Rscript simulations/continuous/generate_linear_gaussian_data.R
Rscript simulations/continuous/generate_nonlinear_data.R
Rscript simulations/categorical/generate_categorical_data.RYou can edit run_all_data.R to comment out lines corresponding to data types you are not interested in.
π Output directory
All output datasets are saved automatically in the simulated_data/ directory.
The baselines/ directory contains benchmarking scripts for external algorithms implemented in Python, MATLAB, and Java. These include:
- π’ DAG-GNN (Python) β
4ff8775 - βͺοΈ FCI (Python) β
9689c1b - π£ GFCI (Java, via py-tetrad) β
ea7cefb - βͺοΈ M3HC (MATLAB) β
a829193
For DAG-GNN and FCI, you will find scripts that launch HPC jobs with automatic data generation on the fly.
For M3HC (MATLAB) and GFCI (Java), only local execution scripts are provided. These require that the datasets have already been generated in advance (see Section 3).
π Output directory
All output graphs are saved automatically in the results/ directory.
After running the benchmark simulations, you can compute performance metrics (e.g., Precision, Recall, F-score) and generate comparative plots by executing the following script:
Rscript utils/benchmark_plot.RThis script processes the output graph predictions stored in the results/ directory and produces evaluation figures for each algorithm and setting.
Ensure that all necessary simulation results are present in results/ before launching this analysis.
π Output directory
All results are saved automatically in the results/ directory:
results/metrics/: computed performance metrics (Precision, Recall, F-score)results/plots/: benchmark figures
To reproduce the toy model summary table (Table E.1), simply run the following script:
Rscript utils/toy_model.RIf you use this code, please cite:
@InProceedings{pmlr-v267-lagrange25a,
title = {An Efficient Search-and-Score Algorithm for Ancestral Graphs using Multivariate Information Scores for Complex Non-linear and Categorical Data},
author = {Lagrange, Nikita and Isambert, Herve},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
pages = {32164--32187},
year = {2025},
editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
volume = {267},
series = {Proceedings of Machine Learning Research},
month = {13--19 Jul},
publisher = {PMLR},
pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lagrange25a/lagrange25a.pdf},
url = {https://proceedings.mlr.press/v267/lagrange25a.html}
}-
Nikita Lagrange β PhD Student, CNRS, Institut Curie, Sorbonne UniversitΓ©
GitHub β’ Website -
HervΓ© Isambert β Research Director, CNRS, Institut Curie, Sorbonne UniversitΓ©
Website
Contributions and feedback are welcome β open an issue or a pull request.