CausationEntropy

A Python library for discovering causal networks from time series data using Optimal Causation Entropy (oCSE).

Overview

CausationEntropy implements state-of-the-art information-theoretic methods for causal discovery from multivariate time series. The library provides robust algorithms that can identify causal relationships while controlling for confounding variables and false discoveries.

What it does

Given time series data, CausationEntropy finds which variables cause changes in other variables by:

Predictive Testing: Testing if knowing variable X at time t helps predict variable Y at time t+1
Information Theory: Using conditional mutual information to measure predictive relationships
Statistical Control: Rigorous statistical testing to avoid false discoveries
Multiple Methods: Supporting various information estimators and discovery algorithms

Installation

From PyPI (recommended)

pip install causationentropy

Development Installation

git clone https://github.com/Center-For-Complex-Systems-Science/causationentropy.git
cd causationentropy
pip install -e .

Run the tests

python -m pytest causationentropy/tests/ --cov=causationentropy --cov-report=xml --cov-report=term-missing -v

Quick Start

See our Quick Start colab notebook:

Basic Usage

Get the relationships as a data frame:

import pandas as pd
from causationentropy import discover_network
from causationentropy.graph import network_to_dataframe

# Load your time series data (variables as columns, time as rows)
data = pd.read_csv('data.csv')

# Discover causal network
network = discover_network(data, method='standard', max_lag=5)
df = network_to_dataframe(network)
df.head()

Plot the causal network:

from causationentropy import discover_network
from causationentropy.core.plotting import plot_causal_network

# Load your time series data (variables as columns, time as rows)
data = pd.read_csv('data.csv')

# Discover causal network
network = discover_network(data, method='standard', max_lag=5)
fig, ax = plot_causal_network(network, save_path="network.png")

Note: This implementation of this algorithm runs in O(n^2 T log T) where N is the number of variables and T is the length of the time series. Application of this algorithm without optimizations is computationally intensive. When running this algorithm, please be patient. Optimizations of the algorithm are planned for a later release that leverage singular value decomposition and KD-Trees. However, these optimizations are not part of the original algorithm. Adding additional lags also contributes to additional performance degradations.

Advanced Configuration

from causationentropy import discover_network

# Configure discovery parameters
network = discover_network(
    data,
    method='standard',          # 'standard', 'alternative', 'information_lasso', or 'lasso'
    information='gaussian',     # 'gaussian', 'knn', 'kde', 'geometric_knn', or 'poisson'
    max_lag=5,                  # Maximum time lag to consider
    alpha_forward=0.05,         # Forward selection significance
    alpha_backward=0.05,        # Backward elimination significance
    n_shuffles=200              # Permutation test iterations
)

Synthetic Data Example

from causationentropy.datasets import synthetic
from causationentropy import discover_network

# Generate synthetic causal time series
data, true_network = synthetic.linear_stochastic_gaussian_process(
    n_variables=5, 
    n_samples=1000, 
    sparsity=0.3
)

# Discover network
discovered = discover_network(data)

Key Features

Multiple Algorithms: Standard, alternative, information lasso, and lasso variants of oCSE
Flexible Information Estimators: Gaussian, k-NN, KDE, geometric k-NN, and Poisson methods
Statistical Rigor: Permutation-based significance testing with comprehensive test coverage
Synthetic Data: Built-in generators for testing and validation
Visualization: Network plotting and analysis tools

Mathematical Foundation

The algorithm uses conditional mutual information to quantify causal relationships:

$$I(X; Y | Z) = H(X | Z) + H(Y | Z) - H(X, Y | Z)$$

This measures how much variable X tells us about variable Y, beyond what we already know from conditioning set Z.

Causal Discovery Rule: Variable X causes Y if knowing X(t) significantly improves prediction of Y(t+1), even when controlling for all other relevant variables.

The algorithm implements a two-phase approach:

Forward Selection: Iteratively adds predictors that maximize conditional mutual information
Backward Elimination: Removes predictors that lose significance when conditioned on others

Documentation

📚 Read the full documentation on ReadTheDocs

API Reference: Complete function and class documentation
User Guide: Detailed tutorials and examples
Theory: Mathematical background and algorithms
Examples: Check the notebooks/ directory
Research Papers: See the theory glossary in the documentation

Local Documentation

Build documentation locally:

cd docs/
make html
# Open docs/_build/html/index.html

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Citation

If you use this library in your research, please cite:

   @misc{slote2025causationentropy,
     author  = {Slote, Kevin and Fish, Jeremie and Bollt, Erik},
     title   = {CausationEntropy: A Python Library for Causal Discovery},
     url     = {https://github.com/Center-For-Complex-Systems-Science/causationentropy},
     doi     = {10.5281/zenodo.17047565}
   }

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [email protected]

Acknowledgments

This work builds upon fundamental research in information theory, causal inference, and time series analysis. Special thanks to the open-source scientific Python community.

Original Code

LLM Disclosure

Generative AI was used to help with doc strings, documentation, and unit tests.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github/workflows		.github/workflows
causationentropy		causationentropy
docs		docs
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CausationEntropy

Overview

What it does

Installation

From PyPI (recommended)

Development Installation

Run the tests

Quick Start

Basic Usage

Advanced Configuration

Synthetic Data Example

Key Features

Mathematical Foundation

Documentation

Local Documentation

Contributing

Citation

License

Support

Acknowledgments

LLM Disclosure

About

Uh oh!

Releases 3

Packages

Contributors 2

Uh oh!

Languages

License

Center-For-Complex-Systems-Science/causationentropy

Folders and files

Latest commit

History

Repository files navigation

CausationEntropy

Overview

What it does

Installation

From PyPI (recommended)

Development Installation

Run the tests

Quick Start

Basic Usage

Advanced Configuration

Synthetic Data Example

Key Features

Mathematical Foundation

Documentation

Local Documentation

Contributing

Citation

License

Support

Acknowledgments

LLM Disclosure

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Uh oh!

Languages

Packages