perf: reduce import time by deferring heavy dependencies#692
perf: reduce import time by deferring heavy dependencies#692
Conversation
Patch v0.10.1
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #692 +/- ##
==========================================
+ Coverage 93.59% 93.68% +0.09%
==========================================
Files 66 66
Lines 5120 5133 +13
==========================================
+ Hits 4792 4809 +17
+ Misses 328 324 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
kaiser-dan
left a comment
There was a problem hiding this comment.
Thanks @leotrs , this should speed things up nicely!
I left a few comments, mostly stylistic, but one in higher_order_network.py concerns error propagation and is more design-oriented.
There was a problem hiding this comment.
I believe these should remain outside of the public API.
There was a problem hiding this comment.
These are user-facing I/O functions (xgi.read_json() / xgi.write_json()) — they should be part of the public API. The __all__ in readwrite/__init__.py feeds into xgi/__init__.py's __all__, which is how they end up as top-level exports. They were just missing from the list.
There was a problem hiding this comment.
Didn't we decide to remove these from the public API in this thread?
|
|
||
| else: | ||
| raise XGIError("Input data has unsupported type.") | ||
| return _to_hypergraph_from_external_type(data, create_using) |
There was a problem hiding this comment.
I'm not crazy about the else branch passing to an internal helper which then handles any errors. If we extended the elif branches to check for the remaining supported types then dispatch the correct helper, we could keep all of the dispatch and supported type logic in the one function which may make deciphering stack traces clearer. Specifically, any stack trace originating from to_hypergraph is from the point of conversion dispatch, e.g. an unsupported data type, and any errors from the internal helpers is from errors within a supported type.
There was a problem hiding this comment.
Fair point about stack traces. The tradeoff here is that inlining the `isinstance` checks back into `to_hypergraph` would require importing `pd.DataFrame` and all the scipy sparse types at function entry, which defeats the lazy-import purpose. The helper adds one extra frame to the stack trace, but keeps the heavy imports out of the common path. I think the performance win justifies the indirection, but open to other ideas if you see a cleaner way to structure it.
There was a problem hiding this comment.
Hmm, no cleaner solution comes to mind. Related to recent #691 and the discussion therein, I am wondering if we would benefit from a more verbose exception hierarchy? The connection here is maybe sort of XGIDataError for data related issues for higher-order interactions in particular (e.g. this unsupported data type for conversions, dihyperedges with empty heads/tails, maybe others?).
Without extending the exception hierarchy, I can't think of "nice" ways to maintain error legibility, so I say we keep the performance improvement and move on.
scipy's diags_array now warns (FutureWarning) when int64 input is silently cast to float64 output, and will change this behavior in a future release. This would silently alter numerical results. Ensure degree and weight arrays are explicitly float before passing to diags_array in both laplacian() and normalized_hypergraph_laplacian(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes to make exception usage consistent: 1. IDNotFound now inherits from (XGIException, KeyError) instead of just KeyError. This means all XGI exceptions can be caught uniformly with `except XGIException`, while preserving backwards compat for code that catches KeyError. 2. Standardize XGIError vs ValueError: use ValueError for pure input validation (bad argument values like "probability not in [0,1]"), reserve XGIError for domain-specific errors (disconnected graph, wrong network type, frozen network). Changes affect: - generators/uniform.py: all input validation raises - generators/lattice.py: invalid k value - generators/simple.py: invalid edge/core size - algorithms/assortativity.py: invalid kind argument Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add explicit __all__ lists to every subpackage (core, algorithms, communities, convert, drawing, dynamics, generators, linalg, readwrite, utils) and aggregate them in the top-level xgi/__init__.py. This locks down the public API surface to 169 names, preventing internal modules from leaking through wildcard imports. Also fixes tests that accessed xgi.utilities.dual_dict (an internal module path that was only reachable via leaked wildcard imports) to use the correct public path xgi.dual_dict. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove read_json/write_json from readwrite __all__ (deprecated) - Remove IDDict, Trie, crest_r from utils __all__ (internal) - Update test imports to use direct module paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…stances (#689) * feat: add seed parameter to unseeded stochastic functions Add seed parameter to h_eigenvector_centrality, degree_assortativity, simulate_kuramoto, and random_edge_shuffle for reproducibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update doctest for pandas dtype='str' change Pandas now uses dtype='str' instead of dtype='object' for string Index columns. Compare as list to avoid version-dependent repr. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make spectral clustering test deterministic and fix pandas doctest Seed numpy before eigsh in spectral_clustering so ARPACK produces consistent results. Seed random data in kmeans tests. Update pandas dtype doctest to compare as list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add floating point tolerance to Laplacian eigenvalue test eigvalsh can return eigenvalues like -1.3e-17 for a positive semi-definite matrix due to floating point arithmetic. Use -1e-12 tolerance instead of strict >= 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make spectral clustering deterministic across platforms Pass explicit v0 to eigsh when seed is provided, making ARPACK initialization deterministic. Relax test assertion to check core community membership rather than exact partition, since boundary nodes can be assigned differently across LAPACK implementations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: migrate from global random state to local RNG instances Replace all np.random.seed()/random.seed() + global function calls with np.random.default_rng(seed) and local Generator instances. This eliminates global state pollution, is thread-safe, and follows Scientific Python best practices. All functions now accept int | np.random.Generator | None for the seed parameter. Removed all stdlib random usage from xgi source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update doctest expected value in largest_connected_hypergraph The new RNG stream produces a different random hypergraph, changing the size of the largest connected component from 6 to 8. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update tutorial seed for changed random stream The fast_random_hypergraph with seed=2 now produces a different number of dyads due to the RNG migration, causing a color array length mismatch in the multilayer drawing tutorial. Changed to seed=8 which produces 10 dyads matching the hardcoded color lists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments on seed/RNG migration - Simplify rng.choice calls (remove unnecessary np.array conversion, use direct list selection for edge_list) - Standardize seed docstrings to "int, numpy.random.Generator, or None" for all functions using np.random.default_rng(seed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: simplify rng.choice to select directly from list Address review suggestion to pass the list directly to rng.choice instead of sampling indices. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move scipy.stats, scipy.special, pandas, matplotlib.colors, and requests from module-level to function-level imports where they are only used in specific methods. This reduces `import xgi` from ~4.6s to ~2.1s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…erators Move networkx, pandas, and scipy.sparse imports from module-level to function-level in convert submodules (graph, line_graph, bipartite_graph, encapsulation_dag, higher_order_network, pandas), algorithms/centrality, and generators/simplicial_complexes. In higher_order_network, restructure to_hypergraph and to_simplicial_complex so common input types (list, dict, Hypergraph) are handled before falling through to a helper that imports pandas/scipy for DataFrame/matrix inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rng.choice on a Python list returns numpy scalars, causing doctest failures showing np.int64 instead of plain ints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These public API functions were incorrectly removed in the define-all-exports cleanup, breaking the read/write tutorial. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the last two uses of scipy.special.binom with comb for consistency with the rest of the codebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ebb6929 to
dc32aa0
Compare
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Summary
scipy.stats,scipy.special,scipy.sparse.linalg,pandas,matplotlib.colors,requests, andnetworkxfrom module-level to function-level imports where they are only used in specific methodsto_hypergraph/to_simplicial_complexinconvert/higher_order_network.pyso common input types (list, dict, Hypergraph) are handled before falling through to a helper that imports pandas/scipy for DataFrame/matrix inputsimport xgicompletes in under 3 secondsAddresses #651.
Files changed
xgi/stats/__init__.py— deferscipy.stats.moment,pandasxgi/utils/utilities.py— deferpandas,requests,matplotlib.colorsxgi/algorithms/properties.py— deferscipy.special.combxgi/algorithms/simpliciality.py— deferscipy.special.binomxgi/algorithms/centrality.py— deferscipy.sparse.linalg.eigsh,networkxxgi/convert/graph.py,line_graph.py,bipartite_graph.py,encapsulation_dag.py— defernetworkxxgi/convert/higher_order_network.py— deferpandas,scipy.sparse,numpy.matrixxgi/convert/pandas.py— deferpandasxgi/generators/simplicial_complexes.py— defernetworkx,scipy.special.combtests/test_import_time.py— new regression testTest plan
🤖 Generated with Claude Code