perf: reduce import time by deferring heavy dependencies by leotrs · Pull Request #692 · xgi-org/xgi

leotrs · 2026-03-05T06:05:52Z

Summary

Defer scipy.stats, scipy.special, scipy.sparse.linalg, pandas, matplotlib.colors, requests, and networkx from module-level to function-level imports where they are only used in specific methods
Restructure to_hypergraph/to_simplicial_complex in convert/higher_order_network.py so common input types (list, dict, Hypergraph) are handled before falling through to a helper that imports pandas/scipy for DataFrame/matrix inputs
Add regression test asserting import xgi completes in under 3 seconds

Addresses #651.

Files changed

xgi/stats/__init__.py — defer scipy.stats.moment, pandas
xgi/utils/utilities.py — defer pandas, requests, matplotlib.colors
xgi/algorithms/properties.py — defer scipy.special.comb
xgi/algorithms/simpliciality.py — defer scipy.special.binom
xgi/algorithms/centrality.py — defer scipy.sparse.linalg.eigsh, networkx
xgi/convert/graph.py, line_graph.py, bipartite_graph.py, encapsulation_dag.py — defer networkx
xgi/convert/higher_order_network.py — defer pandas, scipy.sparse, numpy.matrix
xgi/convert/pandas.py — defer pandas
xgi/generators/simplicial_complexes.py — defer networkx, scipy.special.comb
tests/test_import_time.py — new regression test

Test plan

All 392 tests pass + 1 new import time regression test
Measured import time reduction from ~3s to ~1.5s on macOS/arm64

🤖 Generated with Claude Code

Patch v0.10.1

codecov · 2026-03-05T06:19:46Z

Codecov Report

❌ Patch coverage is 98.11321% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 93.68%. Comparing base (6e7fe50) to head (ebb6929).
⚠️ Report is 19 commits behind head on dev.

Files with missing lines	Patch %	Lines
xgi/stats/__init__.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #692      +/-   ##
==========================================
+ Coverage   93.59%   93.68%   +0.09%     
==========================================
  Files          66       66              
  Lines        5120     5133      +13     
==========================================
+ Hits         4792     4809      +17     
+ Misses        328      324       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kaiser-dan

Thanks @leotrs , this should speed things up nicely!

I left a few comments, mostly stylistic, but one in higher_order_network.py concerns error propagation and is more design-oriented.

xgi/core/hypergraph.py

xgi/algorithms/simpliciality.py

tests/test_import_time.py

kaiser-dan · 2026-03-05T14:37:56Z

xgi/readwrite/__init__.py

I believe these should remain outside of the public API.

These are user-facing I/O functions (xgi.read_json() / xgi.write_json()) — they should be part of the public API. The __all__ in readwrite/__init__.py feeds into xgi/__init__.py's __all__, which is how they end up as top-level exports. They were just missing from the list.

Didn't we decide to remove these from the public API in this thread?

xgi/utils/utilities.py

kaiser-dan · 2026-03-05T14:50:57Z

xgi/convert/higher_order_network.py

-
    else:
-        raise XGIError("Input data has unsupported type.")
+        return _to_hypergraph_from_external_type(data, create_using)


I'm not crazy about the else branch passing to an internal helper which then handles any errors. If we extended the elif branches to check for the remaining supported types then dispatch the correct helper, we could keep all of the dispatch and supported type logic in the one function which may make deciphering stack traces clearer. Specifically, any stack trace originating from to_hypergraph is from the point of conversion dispatch, e.g. an unsupported data type, and any errors from the internal helpers is from errors within a supported type.

Fair point about stack traces. The tradeoff here is that inlining the `isinstance` checks back into `to_hypergraph` would require importing `pd.DataFrame` and all the scipy sparse types at function entry, which defeats the lazy-import purpose. The helper adds one extra frame to the stack trace, but keeps the heavy imports out of the common path. I think the performance win justifies the indirection, but open to other ideas if you see a cleaner way to structure it.

Hmm, no cleaner solution comes to mind. Related to recent #691 and the discussion therein, I am wondering if we would benefit from a more verbose exception hierarchy? The connection here is maybe sort of XGIDataError for data related issues for higher-order interactions in particular (e.g. this unsupported data type for conversions, dihyperedges with empty heads/tails, maybe others?).

Without extending the exception hierarchy, I can't think of "nice" ways to maintain error legibility, so I say we keep the performance improvement and move on.

scipy's diags_array now warns (FutureWarning) when int64 input is silently cast to float64 output, and will change this behavior in a future release. This would silently alter numerical results. Ensure degree and weight arrays are explicitly float before passing to diags_array in both laplacian() and normalized_hypergraph_laplacian(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two changes to make exception usage consistent: 1. IDNotFound now inherits from (XGIException, KeyError) instead of just KeyError. This means all XGI exceptions can be caught uniformly with `except XGIException`, while preserving backwards compat for code that catches KeyError. 2. Standardize XGIError vs ValueError: use ValueError for pure input validation (bad argument values like "probability not in [0,1]"), reserve XGIError for domain-specific errors (disconnected graph, wrong network type, frozen network). Changes affect: - generators/uniform.py: all input validation raises - generators/lattice.py: invalid k value - generators/simple.py: invalid edge/core size - algorithms/assortativity.py: invalid kind argument Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add explicit __all__ lists to every subpackage (core, algorithms, communities, convert, drawing, dynamics, generators, linalg, readwrite, utils) and aggregate them in the top-level xgi/__init__.py. This locks down the public API surface to 169 names, preventing internal modules from leaking through wildcard imports. Also fixes tests that accessed xgi.utilities.dual_dict (an internal module path that was only reachable via leaked wildcard imports) to use the correct public path xgi.dual_dict. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove read_json/write_json from readwrite __all__ (deprecated) - Remove IDDict, Trie, crest_r from utils __all__ (internal) - Update test imports to use direct module paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…stances (#689) * feat: add seed parameter to unseeded stochastic functions Add seed parameter to h_eigenvector_centrality, degree_assortativity, simulate_kuramoto, and random_edge_shuffle for reproducibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update doctest for pandas dtype='str' change Pandas now uses dtype='str' instead of dtype='object' for string Index columns. Compare as list to avoid version-dependent repr. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make spectral clustering test deterministic and fix pandas doctest Seed numpy before eigsh in spectral_clustering so ARPACK produces consistent results. Seed random data in kmeans tests. Update pandas dtype doctest to compare as list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add floating point tolerance to Laplacian eigenvalue test eigvalsh can return eigenvalues like -1.3e-17 for a positive semi-definite matrix due to floating point arithmetic. Use -1e-12 tolerance instead of strict >= 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make spectral clustering deterministic across platforms Pass explicit v0 to eigsh when seed is provided, making ARPACK initialization deterministic. Relax test assertion to check core community membership rather than exact partition, since boundary nodes can be assigned differently across LAPACK implementations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: migrate from global random state to local RNG instances Replace all np.random.seed()/random.seed() + global function calls with np.random.default_rng(seed) and local Generator instances. This eliminates global state pollution, is thread-safe, and follows Scientific Python best practices. All functions now accept int | np.random.Generator | None for the seed parameter. Removed all stdlib random usage from xgi source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update doctest expected value in largest_connected_hypergraph The new RNG stream produces a different random hypergraph, changing the size of the largest connected component from 6 to 8. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update tutorial seed for changed random stream The fast_random_hypergraph with seed=2 now produces a different number of dyads due to the RNG migration, causing a color array length mismatch in the multilayer drawing tutorial. Changed to seed=8 which produces 10 dyads matching the hardcoded color lists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments on seed/RNG migration - Simplify rng.choice calls (remove unnecessary np.array conversion, use direct list selection for edge_list) - Standardize seed docstrings to "int, numpy.random.Generator, or None" for all functions using np.random.default_rng(seed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: simplify rng.choice to select directly from list Address review suggestion to pass the list directly to rng.choice instead of sampling indices. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Move scipy.stats, scipy.special, pandas, matplotlib.colors, and requests from module-level to function-level imports where they are only used in specific methods. This reduces `import xgi` from ~4.6s to ~2.1s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…erators Move networkx, pandas, and scipy.sparse imports from module-level to function-level in convert submodules (graph, line_graph, bipartite_graph, encapsulation_dag, higher_order_network, pandas), algorithms/centrality, and generators/simplicial_complexes. In higher_order_network, restructure to_hypergraph and to_simplicial_complex so common input types (list, dict, Hypergraph) are handled before falling through to a helper that imports pandas/scipy for DataFrame/matrix inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rng.choice on a Python list returns numpy scalars, causing doctest failures showing np.int64 instead of plain ints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

These public API functions were incorrectly removed in the define-all-exports cleanup, breaking the read/write tutorial. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the last two uses of scipy.special.binom with comb for consistency with the rest of the codebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

review-notebook-app · 2026-03-05T21:15:48Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Merge pull request #690 from xgi-org/dev

ab72705

Patch v0.10.1

kaiser-dan reviewed Mar 5, 2026

View reviewed changes

leotrs and others added 12 commits March 5, 2026 22:13

test: add regression test for import time (<3s)

3012b77

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ci: run test workflows on PRs targeting dev branch

3ea85bd

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: convert rng.choice results to Python types in random_edge_shuffle

26b8c67

rng.choice on a Python list returns numpy scalars, causing doctest failures showing np.int64 instead of plain ints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: restore read_json and write_json to readwrite __all__

2a28f2e

These public API functions were incorrectly removed in the define-all-exports cleanup, breaking the read/write tutorial. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: standardize on scipy.special.comb over binom

dc32aa0

Replace the last two uses of scipy.special.binom with comb for consistency with the rest of the codebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

leotrs force-pushed the lazy-imports-reduce-import-time branch from ebb6929 to dc32aa0 Compare March 5, 2026 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce import time by deferring heavy dependencies#692

perf: reduce import time by deferring heavy dependencies#692
leotrs wants to merge 13 commits intodevfrom
lazy-imports-reduce-import-time

leotrs commented Mar 5, 2026

Uh oh!

codecov bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

kaiser-dan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaiser-dan Mar 5, 2026

Uh oh!

leotrs Mar 5, 2026

Uh oh!

kaiser-dan Mar 6, 2026

Uh oh!

Uh oh!

kaiser-dan Mar 5, 2026

Uh oh!

leotrs Mar 5, 2026

Uh oh!

kaiser-dan Mar 6, 2026

Uh oh!

review-notebook-app bot commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leotrs commented Mar 5, 2026

Summary

Files changed

Test plan

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kaiser-dan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaiser-dan Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

leotrs Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

kaiser-dan Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kaiser-dan Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

leotrs Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

kaiser-dan Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

review-notebook-app bot commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 5, 2026 •

edited

Loading