Skip to content

v24.11.00

Compare
Choose a tag to compare
@manopapad manopapad released this 17 Nov 00:51
· 15 commits to branch-24.03 since this release
b198f33

This is a beta release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

New features

Improved API coverage

  • Implement np.unravel_index
  • Implement np.angle
  • Implement np.median
  • Implement np.ix_
  • Implement np.meshgrid
  • Implement np.expand_dims
  • Implement np.rot90
  • Implement np.round
  • Implement np.fft.fftshift and np.fft.ifftshift
  • Implement np.roll
  • Support full_matrices parameter of np.linalg.svd

Memory management enhancements

  • Memory efficient implementation of matrix multiplication - this implementation batches over the reduction dimension, achieving constant memory overhead regardless of array sizes.
  • Memory efficiency for stencil computation - add np.ndarray.stencil_hint method, that instructs cuPyNumeric to pre-allocate the necessary space for ghost elements when an array is to be used in a stencil computation, reducing intermediate memory use.
  • Memory allocation report - report the object-memory mapping when a computation runs out of memory, to help users debug and optimize memory usage.

Enhanced infrastructure support

  • GH200 Grace Hopper Superchip support - allows users to leverage GH200-based cloud instances and supercomputers.
  • GASNet support - support GASNet as an alternative networking backend to UCX, using a GASNet wrapper, MPI wrapper, and custom build utilities.
  • Initial HDF5 support - distributed read/write of HDF5 files using a POSIX backend.
  • Automatic resource configuration at run time - automatically discover and use all the available compute resources including CPU, GPU, system memory, and framebuffer memory.
  • More enhancements from Legate 24.11

Other

  • Re-implement the RNG module on top of the C++ STL random library, removing the need to have cuRand in CPU-only installations.

Known Issues

cuPyNumeric will emit a false-positive warning like the following:

RuntimeWarning: cuPyNumeric has not implemented numpy.ndarray.__buffer__ and is falling back to canonical NumPy. You may notice significantly decreased performance for this function call.

in cases such as when an arithmetic operation is performed on a scalar array, e.g. cupynumeric.array(42) * 2. There is no actual performance degradation occurring in this case. We are working on a patch that will suppress this warning.