v24.11.00
·
15 commits
to branch-24.03
since this release
This is a beta release of cuPyNumeric.
Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.
Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.
New features
Improved API coverage
- Implement
np.unravel_index
- Implement
np.angle
- Implement
np.median
- Implement
np.ix_
- Implement
np.meshgrid
- Implement
np.expand_dims
- Implement
np.rot90
- Implement
np.round
- Implement
np.fft.fftshift
andnp.fft.ifftshift
- Implement
np.roll
- Support
full_matrices
parameter ofnp.linalg.svd
Memory management enhancements
- Memory efficient implementation of matrix multiplication - this implementation batches over the reduction dimension, achieving constant memory overhead regardless of array sizes.
- Memory efficiency for stencil computation - add
np.ndarray.stencil_hint
method, that instructs cuPyNumeric to pre-allocate the necessary space for ghost elements when an array is to be used in a stencil computation, reducing intermediate memory use. - Memory allocation report - report the object-memory mapping when a computation runs out of memory, to help users debug and optimize memory usage.
Enhanced infrastructure support
- GH200 Grace Hopper Superchip support - allows users to leverage GH200-based cloud instances and supercomputers.
- GASNet support - support GASNet as an alternative networking backend to UCX, using a GASNet wrapper, MPI wrapper, and custom build utilities.
- Initial HDF5 support - distributed read/write of HDF5 files using a POSIX backend.
- Automatic resource configuration at run time - automatically discover and use all the available compute resources including CPU, GPU, system memory, and framebuffer memory.
- More enhancements from Legate 24.11
Other
- Re-implement the RNG module on top of the C++ STL random library, removing the need to have cuRand in CPU-only installations.
Known Issues
cuPyNumeric will emit a false-positive warning like the following:
RuntimeWarning: cuPyNumeric has not implemented numpy.ndarray.__buffer__ and is falling back to canonical NumPy. You may notice significantly decreased performance for this function call.
in cases such as when an arithmetic operation is performed on a scalar array, e.g. cupynumeric.array(42) * 2
. There is no actual performance degradation occurring in this case. We are working on a patch that will suppress this warning.