Skip to content

rft: make gamma_inc, gamma_inc_inv GPU-compatible#514

Merged
devmotion merged 1 commit intoJuliaMath:masterfrom
haakon-e:he/rft-make-gamma-inc-gpu
Feb 11, 2026
Merged

rft: make gamma_inc, gamma_inc_inv GPU-compatible#514
devmotion merged 1 commit intoJuliaMath:masterfrom
haakon-e:he/rft-make-gamma-inc-gpu

Conversation

@haakon-e
Copy link
Contributor

@haakon-e haakon-e commented Feb 6, 2026

Purpose

This pull request refactors src/gamma_inc.jl to make gamma_inc and gamma_inc_inv GPU-compatible (verified on NVIDIA A100), with incidental improvements to performance and code clarity.

Summary of changes

  • Replaced all constant coefficient arrays (such as acc0, big1, e0, x0, stirling_coef, auxgam_coef, and various d* arrays) with Tuples, which immutable and compatible with GPU kernels.
    • Updated type restriction in chepolsum accordingly.
  • Replaced findfirst with Base.@nif in gamma_inc_asym to statically unroll the search into an if/elseif/else chain at parse time. This avoids dynamic dispatch that is unsupported in GPU kernels.
  • Updated a few error messages that interpolated values into strings to use LazyString. String interpolation in error messages is not supported in GPU kernels.
  • Refactored auxgam to manually inline its single recursion step. Mathematically the function recurses at most once (the x < 0 branch calls auxgam(1 + x) where 1 + x ≥ 0), but GPU compilers cannot statically prove termination of recursive call graphs, resulting in either stack overflow or illegal memory access at runtime.
  • Added CI tests that verify these functions are inferable and allocation-free, as a proxy for GPU-compatibility.

@haakon-e haakon-e force-pushed the he/rft-make-gamma-inc-gpu branch from 4a1b739 to 1dda484 Compare February 6, 2026 05:02
@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.17%. Comparing base (539d6f8) to head (e0b0908).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #514   +/-   ##
=======================================
  Coverage   94.17%   94.17%           
=======================================
  Files          14       14           
  Lines        2969     2971    +2     
=======================================
+ Hits         2796     2798    +2     
  Misses        173      173           
Flag Coverage Δ
unittests 94.17% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@haakon-e haakon-e force-pushed the he/rft-make-gamma-inc-gpu branch from 1dda484 to 0552ffe Compare February 6, 2026 17:30
@haakon-e haakon-e force-pushed the he/rft-make-gamma-inc-gpu branch from 0552ffe to 1c02910 Compare February 6, 2026 18:04
@haakon-e haakon-e force-pushed the he/rft-make-gamma-inc-gpu branch 5 times, most recently from 2e74767 to fd19e3e Compare February 6, 2026 23:38
@haakon-e
Copy link
Contributor Author

haakon-e commented Feb 9, 2026

@devmotion Is there anything I can do to help get this PR merged?

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@devmotion
Copy link
Member

verified on NVIDIA A100

i wonder if we could verify this in CI to prevent regressions.

@haakon-e
Copy link
Contributor Author

haakon-e commented Feb 9, 2026

verified on NVIDIA A100

i wonder if we could verify this in CI to prevent regressions.

I agree this would be nice. In lieu of actually running CI on a GPU, a few ideas come to mind: checking type stability, and/or checking allocations

@haakon-e haakon-e force-pushed the he/rft-make-gamma-inc-gpu branch from fd19e3e to af280b9 Compare February 11, 2026 19:31
@haakon-e haakon-e force-pushed the he/rft-make-gamma-inc-gpu branch 2 times, most recently from b7c7292 to 7e28dd8 Compare February 11, 2026 20:29
@haakon-e
Copy link
Contributor Author

Let me know if the updated tests seem reasonable to you, and if so, feel free to merge this PR. Thanks for all your feedback, I think it greatly improved this PR!

- replace `Vector`s by statically sized `Tuple`s
    - update type restriction in `chepolsum`
- use Base.@nif to statically unroll findfirst into
    if/elseif/else chain at parse time
- replace interpolated strings in errors with LazyString
- manually inline single recursion step in auxgam;
    GPU compilers cannot statically prove termination
- add tests that checks these functions are
    inferrable and do not allocation memory
@haakon-e haakon-e force-pushed the he/rft-make-gamma-inc-gpu branch from 7e28dd8 to e0b0908 Compare February 11, 2026 20:38
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@devmotion devmotion merged commit d71eddb into JuliaMath:master Feb 11, 2026
12 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants