Skip to content

[ICML 2025] R implementation of MIIC_search&score: a search-and-score algorithm for learning ancestral graphs with latent confounders, using multivariate information over ac-connected subset.

License

Notifications You must be signed in to change notification settings

miicTeam/miicsearchscore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ICML logo

miicsearchscore: An Efficient Search-and-Score Algorithm for Ancestral Graphs using Multivariate Information Scores

ICML 2025 GPL-3.0 License

This repository provides an R implementation of the MIIC_search&score algorithm, introduced in our ICML 2025 paper, for causal discovery in the presence of latent variables. The algorithm combines a theoretical likelihood decomposition for ancestral graphs with a practical, efficient two-step search-and-score procedure based on multivariate information scores.

⚑ Quickstart

# Install devtools if needed
install.packages("devtools")

# Install the package directly
devtools::install_github("miicTeam/miicsearchscore")

# Load the package
library(miicsearchscore)

# Load the example dataset
data(nonlinear_data)

# Run the method on the example data
adj <- run_miic_searchscore(nonlinear_data, n_threads = 1)

πŸ” Overview

The method improves upon MIIC through a greedy scoring scheme based on higher-order ac-connected information subsets. It is especially suited for:

  • Ancestral graphs including latent confounders (bidirected edges),
  • Complex datasets, such as continuous including non-linear couplings between variables, or categorical data,
  • Scalable inference, thanks to localized scoring limited to collider paths of up to two edges.

πŸš€ Installation

You have two ways to install and use the miicsearchscore package:

πŸ§ͺ Option 1: Full repository (R package + benchmark)

To access everything (including the benchmark and simulation scripts):

# Clone the full repository
git clone https://github.com/miicTeam/miicsearchscore.git
cd miicsearchscore

Then, open R from this folder and run:

# Install devtools if needed
install.packages("devtools")

# Install from local source
devtools::install(".")
library(miicsearchscore)

βœ… Option 2: Install only the R package

If you only need the core R functions (no benchmark), use:

# Install devtools if needed
install.packages("devtools")

# Install directly from GitHub
devtools::install_github("miicTeam/miicsearchscore")

# Load the package
library(miicsearchscore)

⚠️ This method installs only the R package β€” not the benchmark/ folder.

🧠 How it works

The algorithm proceeds in two steps:

Step 0: MIIC inference

miic_result <- miic(data,
                 latent = "orientation",
                 propagation = TRUE,
                 consistent = "orientation",
                 n_threads = n_threads)
summary <- miic_result$summary
summary <- summary[summary$type == "P", ]
hash_table <- new.env()
adj_miic <- miic_result$adj_matrix

Step 1: Node-level pruning and conditioning set selection

step1_result <- apply_node_score_step_1(adj_miic, data, hash_table)
adj_step1_node_score <- step1_result$adj
hash_table <- step1_result$hash_table

Step 2: Edge orientation via mutual information delta optimization

step2_result <- apply_edge_score_step_2(adj_step1_node_score, data, hash_table)
adj_step2_edge_score <- step2_result$adj

Or run everything in one call:

adj <- run_miic_searchscore(data, n_threads = 1)

πŸ“ Repository structure

miicsearchscore/
β”œβ”€β”€ R/                         # Core R source files implementing the MIIC_search&score algorithm
β”œβ”€β”€ data/                      # Package dataset (.rda), accessible via data()
β”œβ”€β”€ man/             
β”œβ”€β”€ benchmark/                 # Benchmarking scripts
β”‚   β”œβ”€β”€ MIIC_search_and_score/ # Scripts to run benchmarks for MIIC_search&score
β”‚   β”‚   β”œβ”€β”€ categorical/       # Scripts for categorical data settings
β”‚   β”‚   β”‚   β”œβ”€β”€ bootstrap/    
β”‚   β”‚   β”‚   β”œβ”€β”€ normal/        
β”‚   β”‚   β”œβ”€β”€ continuous/       
β”‚   β”‚   β”‚   β”œβ”€β”€ linear_gaussian/ 
β”‚   β”‚   β”‚   β”œβ”€β”€ non_linear/    
β”‚   β”œβ”€β”€ baselines/             # Scripts to run and evaluate baseline methods
β”‚   β”‚   β”œβ”€β”€ DAGGNN/            
β”‚   β”‚   β”‚   β”œβ”€β”€ linear_gaussian/ 
β”‚   β”‚   β”‚   β”œβ”€β”€ non_linear/    
β”‚   β”‚   β”œβ”€β”€ FCI/               
β”‚   β”‚   β”‚   β”œβ”€β”€ bootstrap/     
β”‚   β”‚   β”‚   β”œβ”€β”€ normal/        
β”‚   β”‚   β”œβ”€β”€ GFCI/             
β”‚   β”‚   β”œβ”€β”€ M3HC/              
β”‚   β”œβ”€β”€ data/                 
β”‚   β”‚   β”œβ”€β”€ CPT/               # Conditional probability tables (used for categorical models)
β”‚   β”œβ”€β”€ simulations/           # Data and graph generation scripts
β”‚   β”‚   β”œβ”€β”€ categorical/       
β”‚   β”‚   β”œβ”€β”€ continuous/        
β”‚   β”œβ”€β”€ utils/                 # Shared utility scripts: plotting, metrics, graph conversion, etc.

πŸ“Š Benchmarks

Benchmarks for reproducing Figures 2, 3, E.2, E.3, and Table E.1 of the paper are provided in the benchmark/ folder. Before running them, make sure you are in thebenchmark/ directory.

πŸ”§ 0. Install required R packages

Before running any simulation, make sure to install the required R packages (with exact versions) using:

Rscript install_requirements.R

This will install all packages listed in requirements.txt, including those from CRAN and Bioconductor.

1. Simulate the graph structures: DAG, CPDAG, and PAG

You can run both simulations (continuous and categorical) at once with:

Rscript simulations/run_all_graph.R

Or run them separately:

Rscript simulations/continuous/simulate_dag_cpdag_pag_continuous.R
Rscript simulations/categorical/simulate_dag_cpdag_pag_categorical.R

πŸ“ Output directory

All output datasets are saved automatically in the simulated_data/graphs/ directory.

πŸš€ 2. Run πŸ”΄ MIIC_search&score and πŸ”΅ MIIC benchmark simulations

You can launch all benchmark simulations at once using the main launcher:

Rscript MIIC_search_and_score/run_all.R

This will execute all benchmark pipelines across continuous and categorical scenarios.

βš™οΈ Alternatively, run each simulation type separately:

πŸ”Ή Linear Gaussian simulations

Rscript MIIC_search_and_score/continuous/linear_gaussian/run_all_linear_gaussian.R

πŸ”Ή Non-linear simulations

Rscript MIIC_search_and_score/continuous/non_linear/run_all_non_linear.R

πŸ”Ή Categorical simulations (normal)

Rscript MIIC_search_and_score/categorical/normal/run_all_categorical.R

πŸ”Ή Categorical simulations (bootstrap)

Rscript MIIC_search_and_score/categorical/bootstrap/run_all_categorical_bootstrap.R

πŸ“ Output directory

All output graphs are saved automatically in the results/ directory.

🧠 Tips

  • If you want to run only a subset of benchmarks, you can edit the run_all.R file and comment out specific simulation blocks.
  • In each subdirectory of MIIC_search_and_score (e.g., categorical/bootstrap), you will find additional scripts that generate job submission files for HPC environments using PBS or SLURM. These scripts typically start with generate_ and lauch_ and are intended to help launch large-scale benchmark runs efficiently on a cluster.

πŸ“¦ 3. (Optional) Generate data for external baseline algorithms

To run other benchmark algorithms (developed in different languages such as Python or MATLAB), you'll need to generate the corresponding datasets from the simulated graphs.

To generate all types of data at once (continuous, non-linear, categorical, etc.), use:

Rscript simulations/run_all_data.R

You can also launch the data generation scripts individually within each subdirectory, similarly to the graph generation step. For example:

Rscript simulations/continuous/generate_linear_gaussian_data.R
Rscript simulations/continuous/generate_nonlinear_data.R
Rscript simulations/categorical/generate_categorical_data.R

You can edit run_all_data.R to comment out lines corresponding to data types you are not interested in.

πŸ“ Output directory

All output datasets are saved automatically in the simulated_data/ directory.

πŸ§ͺ 4. (Optional) Run other baseline algorithms

The baselines/ directory contains benchmarking scripts for external algorithms implemented in Python, MATLAB, and Java. These include:

  • 🟒 DAG-GNN (Python) β€” 4ff8775
  • βšͺ️ FCI (Python) β€” 9689c1b
  • 🟣 GFCI (Java, via py-tetrad) β€” ea7cefb
  • βšͺ️ M3HC (MATLAB) β€” a829193

For DAG-GNN and FCI, you will find scripts that launch HPC jobs with automatic data generation on the fly.

For M3HC (MATLAB) and GFCI (Java), only local execution scripts are provided. These require that the datasets have already been generated in advance (see Section 3).

πŸ“ Output directory

All output graphs are saved automatically in the results/ directory.

πŸ“ˆ 5. (Optional) Evaluate results and generate benchmark plots

After running the benchmark simulations, you can compute performance metrics (e.g., Precision, Recall, F-score) and generate comparative plots by executing the following script:

Rscript utils/benchmark_plot.R

This script processes the output graph predictions stored in the results/ directory and produces evaluation figures for each algorithm and setting.

Ensure that all necessary simulation results are present in results/ before launching this analysis.

πŸ“ Output directory

All results are saved automatically in the results/ directory:

  • results/metrics/: computed performance metrics (Precision, Recall, F-score)
  • results/plots/: benchmark figures

6. Reproduce Table E.1 – Toy Models Summary

To reproduce the toy model summary table (Table E.1), simply run the following script:

Rscript utils/toy_model.R

πŸ“„ Citation

If you use this code, please cite:

@InProceedings{pmlr-v267-lagrange25a,
  title     = {An Efficient Search-and-Score Algorithm for Ancestral Graphs using Multivariate Information Scores for Complex Non-linear and Categorical Data},
  author    = {Lagrange, Nikita and Isambert, Herve},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  pages     = {32164--32187},
  year      = {2025},
  editor    = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume    = {267},
  series    = {Proceedings of Machine Learning Research},
  month     = {13--19 Jul},
  publisher = {PMLR},
  pdf       = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lagrange25a/lagrange25a.pdf},
  url       = {https://proceedings.mlr.press/v267/lagrange25a.html}
}

πŸ‘₯ Authors

  • Nikita Lagrange – PhD Student, CNRS, Institut Curie, Sorbonne UniversitΓ©
    GitHub β€’ Website

  • HervΓ© Isambert – Research Director, CNRS, Institut Curie, Sorbonne UniversitΓ©
    Website

Contributions and feedback are welcome β€” open an issue or a pull request.

About

[ICML 2025] R implementation of MIIC_search&score: a search-and-score algorithm for learning ancestral graphs with latent confounders, using multivariate information over ac-connected subset.

Topics

Resources

License

Stars

Watchers

Forks