-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Milestone
Description
Backends
- Scalar
- SSE2 (in-progress)
- SSE4.2 (in-progress)
- AVX (in-progress)
- AVX2
- AVX512F
- WASM SIMD
- ARM/aarch64 NEON
Extra data types
- i16/u16
- i8/u8
These can use 128-bit registers even on AVX/AVX2, and 256-bit registers on AVX512
Polyfills
- Emulated FMA on older platforms
- For f32, promote to f64 and back.
- For f64, implement this method
Iterator library
- Prototype
Vectorized math library
Currently fully implemented for single and double-precision:
sin, cos, tan, asin, acos, atan, atan2, sinh, cosh, tanh, asinh, acosh, atanh, exp, exp2, exph (0.5 * exp), exp10, exp_m1, cbrt, powf, ln, ln_1p, ln2, ln10, erf, erfinv, tgamma, lgamma, next_float, prev_float
Precision-agnostic implementations: lerp, scale, fmod, powi (single and vector exponents), poly, poly_f, poly_rational, summation_f, product_f, smoothstep, smootherstep, smootheststep, hermite (single and vector degrees), jacobi, legendre, bessel_y
TODO:
- Beta function
- Zeta function
- Digamma function
Bessel functions:
- Bessel J_n for n > 1, n=0 and n=1 are implemented.
- Bessel J_f (Bessel function of the first kind with real order)
- Bessel Y_f (Bessel function of the second kind with real order)
- Bessel I_n (Modified Bessel function of the first kind)
- Bessel K_n (Modified Bessel function of the second kind)
- Hankel function?
Complex and Dual number libraries
- Make difficult parts branchless, ideally.
Precision Improvements
- Improve precision of
lgammawhere possible.- Should it fallback to
ln(tgamma(x))when we know it won't overflow?
- Should it fallback to
- Improve precision of trig functions when angle is a product of π (
sin(x*π), etc.) - Compensated float fallbacks on platforms without FMA
Performance improvements:
- Investigate ways to improve non-FMA operations.
- Look for ways to simplify more expressions algebraically.
- Experiment with the "crush denormals" trick to remove denormal inputs?
1 - (1 - x)is the trick.
Policy improvements:
- Improve codegen size for
Sizepolicy, especially when WASM support is added (both scalar and SIMD)
Testing
- Structured tests for all vector types and backends (some partial tests exist, but I need to clean them up)
- Tests for the math library
Metadata
Metadata
Assignees
Labels
No labels