Skip to content

aa wrapper and KLU migration to PNM for PFs#302

Open
jd-lara wants to merge 15 commits into
mainfrom
jd/AA_wrapper
Open

aa wrapper and KLU migration to PNM for PFs#302
jd-lara wants to merge 15 commits into
mainfrom
jd/AA_wrapper

Conversation

@jd-lara
Copy link
Copy Markdown
Member

@jd-lara jd-lara commented May 14, 2026

This PR provides usage of the AA framework in the rest of the matrices and changes to KLU to be used later in PowerFlows to drop completely KLU.jl dependency and pass onto PFs the protection against the pointer problem and will close Sienna-Platform/PowerFlows.jl#107

@jd-lara jd-lara requested a review from Copilot May 14, 2026 00:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Replaces the optional AppleAccelerate.jl weak dependency / package extension with an in-tree AccelerateWrapper submodule that binds directly to libSparse.dylib on macOS. The wrapper exposes a cached symbolic+numeric LDLT factorization, in-place dense/sparse-RHS solves, and SpMM/SpMV; PNM's PTDF/LODF/VirtualPTDF code paths are migrated from AAFactorization to the new AAFactorCache. Non-Apple builds get a stub module that errors on use, removing the runtime extension gating.

Changes:

  • New src/AccelerateWrapper/ submodule with libSparse ccalls, a symbolic/numeric factor cache, and dense/sparse solve + SpMM bindings (macOS-gated; stubs elsewhere).
  • Inlined _calculate_PTDF_matrix_AppleAccelerate / _calculate_LODF_matrix_AppleAccelerate into src/, swapped _solve_factorization/_create_factorization/with_solver to dispatch on AAFactorCache, and replaced _has_apple_accelerate_ext() with _has_apple_accelerate_backend() = Sys.isapple().
  • Removed ext/AppleAccelerateExt.jl, the AppleAccelerate weakdep/compat entries, the runtests.jl install step, and the corresponding Aqua stale-dep ignore; added test/test_accelerate_wrapper.jl.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/AccelerateWrapper/AccelerateWrapper.jl Submodule entry; macOS-gated includes plus non-Apple stubs.
src/AccelerateWrapper/libsparse_bindings.jl C struct layouts, enums, and mangled @ccall bindings into libSparse.
src/AccelerateWrapper/aa_cache.jl AAFactorCache lifecycle: lower-triangle pattern, symbolic/numeric (re)factor, finalizer.
src/AccelerateWrapper/solve_dense.jl In-place dense vector/matrix solves and \ overload.
src/AccelerateWrapper/solve_sparse_rhs.jl Block-packed sparse-RHS solver with reusable per-cache scratch.
src/AccelerateWrapper/spmm.jl aa_spmm! / aa_spmv! bindings with per-call CSC→Apple index translation.
src/PowerNetworkMatrices.jl Includes the wrapper, imports its symbols, drops AppleAccelerate forward decls.
src/linalg_settings.jl Drops the extension probe; adds _has_apple_accelerate_backend and updates check_linalg_backend.
src/solver_dispatch.jl Adds with_solver overload specialized on AAFactorCache; updates docstring.
src/ptdf_calculations.jl Inlines PTDF AppleAccelerate path using AccelerateWrapper.
src/lodf_calculations.jl Inlines LODF AppleAccelerate path using AccelerateWrapper.
src/virtual_ptdf_calculations.jl Switches _solve_factorization to typed AAFactorCache overload; updates docs.
ext/AppleAccelerateExt.jl Removed (logic moved into src/).
Project.toml Drops AppleAccelerate weakdep, extension entry, and compat bound.
test/PowerNetworkMatricesTests.jl Removes :AppleAccelerate from Aqua stale-dep ignore list.
test/runtests.jl Drops the Pkg.add("AppleAccelerate") branch and updates the comment.
test/test_accelerate_wrapper.jl New unit tests for AAFactorCache, dispatch, with_solver, and KLU parity.
test/test_ptdf.jl, test/test_lodf.jl, test/test_virtual_ptdf.jl, test/test_powerflow_matrix_types.jl, test/test_network_modification.jl Switch guards to _has_apple_accelerate_backend, update messages, and update expected factor type to AAFactorCache.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/AccelerateWrapper/aa_cache.jl Outdated
Comment on lines +1 to +8
"""
solve!(cache, B) -> B

Solve `A · X = B` in place. `B::StridedVecOrMat{Float64}` must have
first-dimension size equal to `cache.n` and unit stride in the first
dimension. Multiple columns of `B` are handled in a single libSparse call.
"""
function solve!(cache::AAFactorCache, B::StridedMatrix{Cdouble})
Comment thread src/linalg_settings.jl
This solver is only available on macOS.
Install AppleAccelerate:
julia> using Pkg; Pkg.add(\"AppleAccelerate\")"""
_has_apple_accelerate_backend() = Sys.isapple()
Comment on lines +95 to +106
# Called by libSparse on failure with a null-terminated C string; we surface
# the message via `@error`. Must be a top-level function so `@cfunction` can
# resolve it.
function _libsparse_report_error(msg::Cstring)
s = unsafe_string(msg)
@error "libSparse reported an error" message = s
return nothing
end

# `reportError` is fired by libSparse before it returns a failure status —
# we log the message so it ends up in user output rather than libSparse's
# own stderr. Passing libc malloc/free explicitly (the C_NULL "use Apple
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded. Looking at AppleAccelerate.jl, I see instead: @cfunction(text->error(unsafe_string(text)), Cvoid, (Cstring, )). Probably don't need to go that low-level, but @error seems risky.

@jd-lara jd-lara changed the title aa wrapper aa wrapper and KLU migration to PNM for PFs May 14, 2026
@jd-lara jd-lara requested a review from Copilot May 14, 2026 04:17
@jd-lara jd-lara marked this pull request as ready for review May 14, 2026 04:18
@jd-lara jd-lara requested a review from luke-kiernan May 14, 2026 04:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 39 changed files in this pull request and generated 2 comments.

Comment thread src/lodf_calculations.jl Outdated
Comment on lines 401 to 414
@@ -383,7 +410,7 @@ efficient when the prerequisite matrices with factorization are already availabl
- `BA::BA_Matrix`: The branch susceptance weighted incidence matrix (B * A)

# Keyword Arguments
- `linear_solver::String = "KLU"`:
- `linear_solver::String = _default_linear_solver()`:
Linear solver algorithm for matrix computations. Currently only "KLU" is supported
Comment thread src/AccelerateWrapper/aa_cache.jl Outdated
Comment on lines +166 to +178
col_start = pos
for p in nzrange(A, j)
if rowval[p] >= j
pos += 1
pos > cache.nnz_tri && return _pattern_mismatch(op)
cache.rowIndices[pos] == Cint(rowval[p] - 1) ||
return _pattern_mismatch(op)
end
end
cache.columnStarts[j + 1] == Clong(pos) ||
return _pattern_mismatch(op)
# silence unused-warning on col_start
col_start === col_start
@jd-lara jd-lara requested a review from josephmckinsey May 14, 2026 14:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

Performance Results

Precompile Time

Main This Branch Delta
2.146 s 2.148 s +0.1%

Execution Time

Test Main This Branch Delta
matpower_ACTIVSg2000_sys-Build PTDF First 1.728 s 1.872 s +8.3%
matpower_ACTIVSg2000_sys-Build PTDF Second 151.5 ms 74.0 ms -51.2%
matpower_ACTIVSg2000_sys-Build Ybus First 362.7 ms 115.0 ms -68.3%
matpower_ACTIVSg2000_sys-Build Ybus Second 12.0 ms 13.8 ms +15.2%
matpower_ACTIVSg2000_sys-Build LODF First 145.0 ms 534.6 ms +268.8%
matpower_ACTIVSg2000_sys-Build LODF Second 157.2 ms 140.2 ms -10.8%
matpower_ACTIVSg2000_sys-Build VirtualMODF First 3.756 s 5.179 s +37.9%
matpower_ACTIVSg2000_sys-Build VirtualMODF Second 202.6 ms 690.4 ms +240.8%
matpower_ACTIVSg2000_sys-VirtualMODF Query 10 rows 507.9 ms 512.6 ms +0.9%
matpower_ACTIVSg2000_sys-Radial network reduction First 455.7 ms 463.5 ms +1.7%
matpower_ACTIVSg2000_sys-Radial network reduction Second 0.6 ms 0.6 ms +2.4%
matpower_ACTIVSg2000_sys-Degree two network reduction First 1.718 s 1.71 s -0.5%
matpower_ACTIVSg2000_sys-Degree two network reduction Second 1.0 ms 0.9 ms -9.2%
Base_Eastern_Interconnect_515GW-Build Ybus First 3.745 s 3.858 s +3.0%
Base_Eastern_Interconnect_515GW-Build Ybus Second 3.804 s 3.459 s -9.1%
Base_Eastern_Interconnect_515GW-Radial network reduction First 1.263 s 31.3 ms -97.5%
Base_Eastern_Interconnect_515GW-Radial network reduction Second 33.9 ms 35.0 ms +3.2%
Base_Eastern_Interconnect_515GW-Degree two network reduction First 352.6 ms 331.7 ms -5.9%
Base_Eastern_Interconnect_515GW-Degree two network reduction Second 108.1 ms 32.5 ms -70.0%

Copy link
Copy Markdown
Collaborator

@luke-kiernan luke-kiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big picture question: what's the motivation or goal here? Yeah, using the public Julia package has its issues and limitations, but it's also modular and maintained in tandem with the _jll binary. So I'd like to hear your reasoning here.

If there's specific shortcomings of these libraries, I'm willing to go open a PR. I expect it wouldn't get merged for a while (~months), but at least then we wouldn't need to maintain our own low-level bindings indefinitely.

My other question: why is AA no longer an extension?

factorization_type::SparseFactorization_t = SparseFactorizationLDLT,
)
n = size(A, 1)
n == size(A, 2) || throw(DimensionMismatch("matrix must be square; got $(size(A))"))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user tries to apply a symmetric factorization to a non symmetric matrix, this will silently take the lower triangle and factorize a different matrix...potentially hazardous. I'd at least check that the pattern is symmetric, with a flag to bypass.

Release the libSparse numeric and symbolic handles held by `cache`, leaving
Julia-side state intact. Idempotent.
"""
function _free_handles!(cache::AAFactorCache)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if it isn't SparseStatusOk? Leaving it around seems iffy. Unlikely to cause issues in practice, but could clog up the heap with un-GCed objects in theory.

Comment on lines +95 to +106
# Called by libSparse on failure with a null-terminated C string; we surface
# the message via `@error`. Must be a top-level function so `@cfunction` can
# resolve it.
function _libsparse_report_error(msg::Cstring)
s = unsafe_string(msg)
@error "libSparse reported an error" message = s
return nothing
end

# `reportError` is fired by libSparse before it returns a failure status —
# we log the message so it ends up in user output rather than libSparse's
# own stderr. Passing libc malloc/free explicitly (the C_NULL "use Apple
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded. Looking at AppleAccelerate.jl, I see instead: @cfunction(text->error(unsafe_string(text)), Cvoid, (Cstring, )). Probably don't need to go that low-level, but @error seems risky.

# Status codes and shared error helper
# ---------------------------------------------------------------------------

const KLU_OK = 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: int backed enum would be more stylistic.

Comment thread src/lodf_calculations.jl
# Keyword Arguments
- `linear_solver::String = "KLU"`:
Linear solver algorithm for matrix computations. Currently only "KLU" is supported
This constructor is intentionally KLU-only because `ABA.K` is always a
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasoning behind this decision? Which matrices are AA-compatible and why?

Comment thread src/lodf_calculations.jl
# element-wise row scaling. KLU's BTF short-circuits this internally so the
# overhead was modest; AA's libSparse and LAPACK's `getrf!`/`getrs!` do
# not, so the previous code was 3–5× slower on AA and order-of-magnitude
# slower on DENSE than necessary. Replace both with a direct row scaling.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, is what we were doing that simple? And yet we were invoking factorization routines? Just want to confirm that this is real human error corrected by AI, not a hallucination

@jd-lara
Copy link
Copy Markdown
Member Author

jd-lara commented May 14, 2026

Big picture question: what's the motivation or goal here? Yeah, using the public Julia package has its issues and limitations, but it's also modular and maintained in tandem with the _jll binary. So I'd like to hear your reasoning here.

If there's specific shortcomings of these libraries, I'm willing to go open a PR. I expect it wouldn't get merged for a while (~months), but at least then we wouldn't need to maintain our own low-level bindings indefinitely.

My other question: why is AA no longer an extension?

In short, I am trying reduce the number of external dependencies under which we have no control specially around the linear solvers and expose only the code that we need. I have been frustrated with the speed at which the other libraries respond and also at the exposure of mistakes there like it happened with the solve! call.

Another reason specially for the linear solvers is to have cohesive infrastructure for PNM and PFs.

Copy link
Copy Markdown

@josephmckinsey josephmckinsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly confused as to why we are making so many changes to KLU immediately again. I haven't dug into the Apple Accelerate part yet.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What feature is this used for? I don't actually see any specific code for KLU, so it seems like it could potentially be used for any LinearAlgebra.Factorization.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why we are wrapping this. KLU's internals weren't very thread-safe, which made it a clearer case until that can be fixed upstream, but I thought AppleAccelerate.jl was a bit better now?

const KLU_INVALID = -3
const KLU_TOO_LARGE = -4
# ---------------------------------------------------------------------------
# Int32 (int) family — mirror of the Int64 set above
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we implementing Int32? It seemed like the previous PR purposefully left out Int32

@jd-lara
Copy link
Copy Markdown
Member Author

jd-lara commented May 14, 2026

I'm mainly confused as to why we are making so many changes to KLU immediately again. I haven't dug into the Apple Accelerate part yet.

Maybe in not the cleanest way I layered the changes needed in KLU here so we can remove the use of KLU.jl in PowerFlows and use our own wrapper. In PFs we use Int32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dependency on KLU.jl private interface

4 participants