Skip to content

Feature Tracking #1

@novacrazy

Description

@novacrazy

Backends

  • Scalar
  • SSE2 (in-progress)
  • SSE4.2 (in-progress)
  • AVX (in-progress)
  • AVX2
  • AVX512F
  • WASM SIMD
  • ARM/aarch64 NEON

Extra data types

  • i16/u16
  • i8/u8

These can use 128-bit registers even on AVX/AVX2, and 256-bit registers on AVX512

Polyfills

  • Emulated FMA on older platforms
    • For f32, promote to f64 and back.
    • For f64, implement this method

Iterator library

  • Prototype

Vectorized math library

Currently fully implemented for single and double-precision:
sin, cos, tan, asin, acos, atan, atan2, sinh, cosh, tanh, asinh, acosh, atanh, exp, exp2, exph (0.5 * exp), exp10, exp_m1, cbrt, powf, ln, ln_1p, ln2, ln10, erf, erfinv, tgamma, lgamma, next_float, prev_float

Precision-agnostic implementations: lerp, scale, fmod, powi (single and vector exponents), poly, poly_f, poly_rational, summation_f, product_f, smoothstep, smootherstep, smootheststep, hermite (single and vector degrees), jacobi, legendre, bessel_y

TODO:

  • Beta function
  • Zeta function
  • Digamma function

Bessel functions:

  • Bessel J_n for n > 1, n=0 and n=1 are implemented.
  • Bessel J_f (Bessel function of the first kind with real order)
  • Bessel Y_f (Bessel function of the second kind with real order)
  • Bessel I_n (Modified Bessel function of the first kind)
  • Bessel K_n (Modified Bessel function of the second kind)
  • Hankel function?

Complex and Dual number libraries

  • Make difficult parts branchless, ideally.

Precision Improvements

  • Improve precision of lgamma where possible.
    • Should it fallback to ln(tgamma(x)) when we know it won't overflow?
  • Improve precision of trig functions when angle is a product of π (sin(x*π), etc.)
  • Compensated float fallbacks on platforms without FMA

Performance improvements:

  • Investigate ways to improve non-FMA operations.
  • Look for ways to simplify more expressions algebraically.
  • Experiment with the "crush denormals" trick to remove denormal inputs?
    • 1 - (1 - x) is the trick.

Policy improvements:

  • Improve codegen size for Size policy, especially when WASM support is added (both scalar and SIMD)

Testing

  • Structured tests for all vector types and backends (some partial tests exist, but I need to clean them up)
  • Tests for the math library

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions