rft: make gamma_inc, gamma_inc_inv GPU-compatible#514
rft: make gamma_inc, gamma_inc_inv GPU-compatible#514devmotion merged 1 commit intoJuliaMath:masterfrom
gamma_inc, gamma_inc_inv GPU-compatible#514Conversation
4a1b739 to
1dda484
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #514 +/- ##
=======================================
Coverage 94.17% 94.17%
=======================================
Files 14 14
Lines 2969 2971 +2
=======================================
+ Hits 2796 2798 +2
Misses 173 173
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1dda484 to
0552ffe
Compare
0552ffe to
1c02910
Compare
2e74767 to
fd19e3e
Compare
|
@devmotion Is there anything I can do to help get this PR merged? |
i wonder if we could verify this in CI to prevent regressions. |
I agree this would be nice. In lieu of actually running CI on a GPU, a few ideas come to mind: checking type stability, and/or checking allocations |
fd19e3e to
af280b9
Compare
b7c7292 to
7e28dd8
Compare
|
Let me know if the updated tests seem reasonable to you, and if so, feel free to merge this PR. Thanks for all your feedback, I think it greatly improved this PR! |
- replace `Vector`s by statically sized `Tuple`s
- update type restriction in `chepolsum`
- use Base.@nif to statically unroll findfirst into
if/elseif/else chain at parse time
- replace interpolated strings in errors with LazyString
- manually inline single recursion step in auxgam;
GPU compilers cannot statically prove termination
- add tests that checks these functions are
inferrable and do not allocation memory
7e28dd8 to
e0b0908
Compare
Purpose
This pull request refactors
src/gamma_inc.jlto makegamma_incandgamma_inc_invGPU-compatible (verified on NVIDIA A100), with incidental improvements to performance and code clarity.Summary of changes
acc0,big1,e0,x0,stirling_coef,auxgam_coef, and variousd*arrays) withTuples, which immutable and compatible with GPU kernels.chepolsumaccordingly.findfirstwithBase.@nifingamma_inc_asymto statically unroll the search into an if/elseif/else chain at parse time. This avoids dynamic dispatch that is unsupported in GPU kernels.LazyString. String interpolation in error messages is not supported in GPU kernels.auxgamto manually inline its single recursion step. Mathematically the function recurses at most once (thex < 0branch callsauxgam(1 + x)where1 + x ≥ 0), but GPU compilers cannot statically prove termination of recursive call graphs, resulting in either stack overflow or illegal memory access at runtime.