Releases · nv-legate/cupynumeric

08 Feb 06:20

marcinz

v25.01.00

a972b23

v25.01.00 Latest

Latest

This is a beta release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/25.01/.

New features

Added functionality

Add the method parameter to cupynumeric.convolve.
Increase the maximum array dimension from 4 to 6.
Experimental support for NumPy 2.0 (not reflected in the package constraints yet).

Memory management enhancements

Updates to take advantage of the deferred-eager pool unification in Legate. This change has the potential to increase the effective available memory capacity by up to 100% for many usecases. It also removes the need for the user to adjust the --eager-alloc-percentage.
Add the offload_to() API, that allows a user to offload an array to a particular memory kind, such that any copies in other memories are discarded. This can be useful e.g. to evict an array from GPU memory onto system memory, freeing up space for subsequent GPU tasks.

I/O improvements

Use cuFile to accelerate HDF5 reads on the GPU.
Add support for reading "binary" HDF5 datasets (in particular useful for reading boolean-type datasets).

UX Improvements

Consider NUMA node topology when allocating CPU cores and memory during automatic machine configuration.
Add environment variable LEGATE_LIMIT_STDOUT, to only print out the output from one of the copies of the top-level program in a multi-process execution.
Remove an extraneous warning about __buffer__ being unimplemented.

Deprecations

Drop support for the Maxwell GPU architecture. cuPyNumeric now requires at least Pascal (sm_60).

Assets 2

07 Dec 06:44

marcinz

v24.11.02

5371ab3

v24.11.02

This is a patch release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

Packaging Changes

Update for Legate v24.11.01

Assets 2

07 Dec 06:42

marcinz

v24.11.01

9627cb8

v24.11.01

This is a patch release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

Bug Fixes

Explicit fallback to __array__() on __buffer__

Assets 2

17 Nov 00:51

manopapad

v24.11.00

b198f33

v24.11.00

This is a beta release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

New features

Improved API coverage

Implement np.unravel_index
Implement np.angle
Implement np.median
Implement np.ix_
Implement np.meshgrid
Implement np.expand_dims
Implement np.rot90
Implement np.round
Implement np.fft.fftshift and np.fft.ifftshift
Implement np.roll
Support full_matrices parameter of np.linalg.svd

Memory management enhancements

Memory efficient implementation of matrix multiplication - this implementation batches over the reduction dimension, achieving constant memory overhead regardless of array sizes.
Memory efficiency for stencil computation - add np.ndarray.stencil_hint method, that instructs cuPyNumeric to pre-allocate the necessary space for ghost elements when an array is to be used in a stencil computation, reducing intermediate memory use.
Memory allocation report - report the object-memory mapping when a computation runs out of memory, to help users debug and optimize memory usage.

Enhanced infrastructure support

GH200 Grace Hopper Superchip support - allows users to leverage GH200-based cloud instances and supercomputers.
GASNet support - support GASNet as an alternative networking backend to UCX, using a GASNet wrapper, MPI wrapper, and custom build utilities.
Initial HDF5 support - distributed read/write of HDF5 files using a POSIX backend.
Automatic resource configuration at run time - automatically discover and use all the available compute resources including CPU, GPU, system memory, and framebuffer memory.
More enhancements from Legate 24.11

Other

Re-implement the RNG module on top of the C++ STL random library, removing the need to have cuRand in CPU-only installations.

Known Issues

cuPyNumeric will emit a false-positive warning like the following:

RuntimeWarning: cuPyNumeric has not implemented numpy.ndarray.__buffer__ and is falling back to canonical NumPy. You may notice significantly decreased performance for this function call.

in cases such as when an arithmetic operation is performed on a scalar array, e.g. cupynumeric.array(42) * 2. There is no actual performance degradation occurring in this case. We are working on a patch that will suppress this warning.

Assets 2

11 Sep 20:36

manopapad

v24.06.01

427da00

v24.06.01

This is a patch release, and includes the following fixes:

Fix for nv-legate/legate#947
Fix package dependencies (cuda and openblas)

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

Assets 2

03 Jul 22:35

manopapad

v24.06.00

510e24a

v24.06.00

This release ports cuNumeric to the C++-based Legate-Core. Additionally, it includes the following new features:

np.linalg.qr, np.linalg.svd (single-GPU support only)
"where" argument for unary operations
np.select
np.flipup, np.fliplr
np.cov
np.load (initial, unoptimized implementation)
np.average
np.logical_and/or.reduce
np.digitize
np.diff
np.linalg.cholesky, np.linalg.solve (multi-GPU support, based on cuSolverMp -- not included in conda packages, requires a manual build)
C++-based ndarray class (experimental support)

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

Known issues

Including the nvidia conda channel in an environment with cunumeric may end up pulling cutensor 2.0, even though the cunumeric packages explicitly request cutensor 1.7. This can cause error messages like this:

OSError: libcutensor.so.1: cannot open shared object file: No such file or directory

This is not an issue with cuNumeric, but with incorrect constraints on the cutensor packages on the nvidia channel. Please avoid including the nvidia conda channel in any conda environment including cunumeric.

Assets 2

21 Nov 01:47

marcinz

v23.11.00

d91f17c

v23.11.00

This release contains performance improvements to the variance operation, and a multi-dimensional Cholesky implementation.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Added variance as a unary reduction by @jjwilke in #593
Add batched cholesky implementation and tests by @jjwilke in #1029

🐛 Bug Fixes

Replacing set with OrderedSet to avoid control-replication violations by @ipdemes in #1054
Inline boolean operators in NumPy are bitwise, not logical by @manopapad in #1057
Fix #1065 ("where" fails with IndexError) by @manopapad in #1067
Fixes #1069, #1070 (minor einsum bugs) by @manopapad in #1072

📖 Documentation

Suggest using mamba over conda by @manopapad in #1068

Full Changelog: v23.09.00...v23.11.00

Contributors

jjwilke, manopapad, and ipdemes

Assets 2

03 Oct 15:23

marcinz

v23.09.00

e66a063

v23.09.00

This release adds support for the quantile API, and includes some performance and documentation improvements (notably a "Best Practices" guide).

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Quantile Implementation by @aschaffer in #664

🛠️ Improvements

Add missing openmp variants to BitGenerator and UniqueReduce by @rohany in #1010
Histogram refactor by @aschaffer in #1003

📖 Documentation

Add best practices info to sphinx docs by @bryevdv in #1048

🐛 Bug Fixes

Missing alignment on histogram call by @manopapad in #999
Fix for control replication violation in test by @ipdemes in #1005
Fix build instructions link by @bryevdv in #1014
Add back None as an accepted value for axis on some type sigs by @manopapad in #1017
If a scalar ufunc arg is cn.ndarray use its type directly by @manopapad in #1011
Skip the docstrings for functions pulled from cloned modules by @manopapad in #1024
Fix random test failures in CPU-only runs by @manopapad in #1025
Don't cast histogram to int64 when density=True by @manopapad in #1042
Explicitly cast result of shift binary operators by @manopapad in #1046
Remove use of deprecated np.find_common_type by @manopapad in #1045

New Contributors

@ajschmidt8 made their first contribution in #1035

Full Changelog: v23.07.00...v23.09.00

Contributors

manopapad, bryevdv, and 4 other contributors

Assets 2

25 Jul 04:51

marcinz

v23.07.00

d413db2

v23.07.00

This release adds support for histogram, broadcast* and various nan* APIs. It also includes performance improvements to the FFT functions and cleanups in ufunc support.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Implement broadcast routines by @bryevdv in #759
Sanitize unary reductions that have NaNs by @shriram-jagan in #925
Histogram Functionality by @aschaffer in #983

🛠️ Improvements

Add ufunc methods by @bryevdv in #834
Support of the shape argument in empty_like() & Co. by @madsbk in #845
Add support for Python 3.11 (#830) by @marcinz in #837
Ensure ufunc/function dispatching is narrow by @seberg in #977
Fft improvements by @mfoerste4 in #732

📖 Documentation

Note new minimum CUDA requirements for conda packages by @manopapad in #875

🐛 Bug Fixes

Fix bugs in concatenate and stack APIs. by @robinwnv in #844
Fixes #858 by @manopapad in #859
Fix concatenate and *stack APIs to support scalars(#818, #839) by @robinwnv in #866
Avoid following compiler symlinks by @manopapad in #880
Fix for some binary operators on float16 by @magnatelee in #889
WAR for TBLIS compiler detection while upstream PR is pending by @manopapad in #890
Also build CPU-only packages for haswell (#869) by @marcinz in #882
Fix array API(#885). by @robinwnv in #910
Fix unit tests by @magnatelee in #920
Fix an incorrect type by @marcinz in #931
Use correct type, to avoid int narrowing by @manopapad in #941
Fix cunumeric.arange issues by @yimoj in #940
Use the right type for scalar arguments by @magnatelee in #942
Fall back to NumPy eagerly on RandomState methods by @manopapad in #959
Fix bugs in random integer functions by @manopapad in #966
Resolve numpy 1.25 issues by @bryevdv in #973
Set lib_dir explicitly to lib/, even on RHEL by @manopapad in #971
fixing putmask logic for scalar inputs by @ipdemes in #980
fixing cuda error by @ipdemes in #978
Change arg to LLONG_MIN to make it consistent with python. by @shriram-jagan in #986
Missing alignment on histogram call by @manopapad in #1000

New Contributors

@madsbk made their first contribution in #845
@sandeepd-nv made their first contribution in #899
@seberg made their first contribution in #977
@shriram-jagan made their first contribution in #988
@aschaffer made their first contribution in #983

Full Changelog: v23.03.00...v23.07.00

Contributors

seberg, manopapad, and 11 other contributors

Assets 2

15 Mar 20:02

marcinz

v23.03.00

9ac887b

v23.03.00

This is the beta release of cuNumeric.

This release is focused on bug fixes, code clean-up and documentation updates, in preparation for entering beta status.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

Do reductions properly in tensor contraction tasks by @magnatelee in #803
Seed the NumPy RNG at the start of every test by @manopapad in #792
Fix handling of negative axis in np.repeat by @manopapad in #821
Fix for #720 (by @lightsighter) by @manopapad in #721
Ensure unary_func seeding is deterministic across processes by @manopapad in #825

🛠️ Improvements

Update the architectures built in conda package by @marcinz in #770
Use thrust::cuda::par_nosync if available by @magnatelee in #780
Preemptively convert to np.ndarray on NumPy fallback by @manopapad in #802
Removing all Legion references from the code by @magnatelee in #811
Remove exception throwing from RNG code by @manopapad in #815
Pin legate to a specific commit by @trxcllnt in #824
Add support for Python 3.11 by @m3vaz in #830

📖 Documentation

[WIP] Docs refresh by @bryevdv in #805

Full Changelog: v23.01.00...v23.03.00

Contributors

trxcllnt, manopapad, and 5 other contributors

Assets 2

Releases: nv-legate/cupynumeric

v25.01.00

New features

Added functionality

Memory management enhancements

I/O improvements

UX Improvements

Deprecations

v24.11.02

Packaging Changes

v24.11.01

Bug Fixes

v24.11.00

New features

Improved API coverage

Memory management enhancements

Enhanced infrastructure support

Other

Known Issues

v24.06.01

v24.06.00

Known issues

v23.11.00

What's Changed

🚀 New Features

🐛 Bug Fixes

📖 Documentation

Contributors

v23.09.00

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

🐛 Bug Fixes

New Contributors

Contributors

v23.07.00

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

🐛 Bug Fixes

New Contributors

Contributors

v23.03.00

What's Changed

🐛 Bug Fixes

🛠️ Improvements

📖 Documentation

Contributors