v0.16.0
Summary
This release reaches an important milestone by making offloading fully asynchronous. Calls to dpnp submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish. The sequential semantics a user comes to expect from execution of Python script is preserved though.
In addition, this release completes implementation of dpnp.fft module and adds several new array manipulation, indexing and elementwise routines. Moreover, it adds support to build dpnp for Nvidia GPUs.
DPNP is now compatible with NumPy 2.0.
Details
Added
- Added implementation of
dpnp.gradientfunction #1859 - Added implementation of
dpnp.sort_complexfunction #1864 - Added implementation of
dpnp.fft.fftanddpnp.fft.ifftfunctions #1879 - Added implementation of
dpnp.isneginfanddpnp.isposinffunctions #1888 - Added implementation of
dpnp.fft.fftfreqanddpnp.fft.rfftfreqfunctions #1898 - Added implementation of
dpnp.fft.fftshiftanddpnp.fft.ifftshiftfunctions #1900 - Added implementation of
dpnp.isreal,dpnp.isrealobj,dpnp.iscomplex, anddpnp.iscomplexobjfunctions #1916 - Added support to build
dpnpfor Nvidia GPU #1926 - Added implementation of
dpnp.fft.rfftanddpnp.fft.irfftfunctions #1928 - Added implementation of
dpnp.nextafterfunction #1938 - Added implementation of
dpnp.trim_zerofunction #1941 - Added implementation of
dpnp.fft.hfftanddpnp.fft.ihfftfunctions #1954 - Added implementation of
dpnp.logaddexp2function #1955 - Added implementation of
dpnp.flatnonzerofunction #1956 - Added implementation of
dpnp.float_powerfunction #1957 - Added implementation of
dpnp.fft.fft2,dpnp.fft.ifft2,dpnp.fft.fftn, anddpnp.fft.ifftnfunctions #1961 - Added implementation of
dpnp.array_equalanddpnp.array_equivfunctions #1965 - Added implementation of
dpnp.nan_to_numfunction #1966 - Added implementation of
dpnp.fixfunction #1971 - Added implementation of
dpnp.fft.rfft2,dpnp.fft.irfft2,dpnp.fft.rfftn, anddpnp.fft.irfftnfunctions #1982 - Added implementation of
dpnp.argwherefunction #2000 - Added implementation of
dpnp.real_if_closefunction #2002 - Added implementation of
dpnp.ndimanddpnp.sizefunctions #2014 - Added implementation of
dpnp.appendanddpnp.asarray_chkfinitefunctions #2015 - Added implementation of
dpnp.array_split,dpnp.split,dpnp.hsplit,dpnp.vsplit, anddpnp.dsplitfunctions #2017 - Added runtime dependency on
intel-gpu-ocl-icd-systempackage #2023 - Added implementation of
dpnp.ravel_multi_indexanddpnp.unravel_indexfunctions #2022 - Added implementation of
dpnp.resizeanddpnp.rot90functions #2030 - Added implementation of
dpnp.requirefunction #2036
Changed
- Extended pre-commit pylint check to
dpnp.fftmodule #1860 - Reworked
vmvector math backend to reusedpctl.tensorfunctions around unary and binary functions #1868 - Extended
dpnp.ndarray.astypemethod to supportdevicekeyword argument #1870 - Improved performance of
dpnp.linalg.solveby implementing a dedicated kernel for its batch implementation #1877 - Extended
dpnp.fabsto supportorderandoutkeyword arguments by writing a dedicated kernel for it #1878 - Extended
dpnp.linalgmodule to supportusm_ndarrayas input #1880 - Reworked
dpnp.modimplementation to be an alias fordpnp.remainder#1882 - Removed the legacy implementation of linear algebra functions from the backend #1887
- Removed the legacy implementation of elementwise functions from the backend #1890
- Extended
dpnp.allanddpnp.anyto supportoutkeyword argument #1893 - Reworked
dpnp.repeatto add a explicit type check of input array #1894 - Improved performance of different functions by adopting asynchronous implementation of
dpctl#1897 - Extended
dpnp.fmaxanddpnp.fminto supportorderandoutkeyword arguments by writing dedicated kernels for them #1905 - Removed the legacy implementation of array creation and manipulation functions from the backend #1903
- Extended
dpnp.extractimplementation to align with NumPy #1906 - Reworked backend implementation to align with non-backward compatible changes in DPC++ 2025.0 #1907
- Removed the legacy implementation of indexing functions from the backend #1908
- Extended
dpnp.takeimplementation to align with NumPy #1909 - Extended
dpnp.placeimplementation to align with NumPy #1912 - Reworked the implementation of indexing functions to avoid unnecessary casting to
dpnp_arraywhen input isusm_ndarray#1913 - Reduced code duplication in the implementation of sorting functions #1914
- Removed the obsolete dparray interface #1915
- Improved performance of
dpnp.linalgmodule for BLAS routines by adopting asynchronous implementation ofdpctl#1919 - Relocated
dpnp.einsumutility functions to a separate file #1920 - Improved performance of
dpnp.linalgmodule for LAPACK routines by adopting asynchronous implementation ofdpctl#1922 - Reworked
dpnp.matmulto allow larger batch size to be used #1927 - Removed data synchronization where it is not needed #1930
- Leveraged
dpctl.tensorimplementation fordpnp.whereto support scalar as input #1932 - Improved performance of
dpnp.linalg.eighby implementing a dedicated kernel for its batch implementation #1936 - Reworked
dpnp.iscloseanddpnp.allcloseto comply with compute follows data approach #1937 - Extended
dpnp.deg2radanddpnp.radiansto supportorderandoutkeyword arguments by writing dedicated kernels for them #1943 dpnpuses pybind11 2.13.1 #1944- Extended
dpnp.degreesanddpnp.rad2degto supportorderandoutkeyword arguments by writing dedicated kernels for them #1949 - Extended
dpnp.unwrapto support all keyword arguments provided by NumPy #1950 - Leveraged
dpctl.tensorimplementation fordpnp.count_nonzerofunction #1962 - Leveraged
dpctl.tensorimplementation fordpnp.difffunction #1963 - Leveraged
dpctl.tensorimplementation fordpnp.take_along_axisfunction #1969 - Reworked
dpnp.ediff1dimplementation through existing functions instead of a separate kernel #1970 - Reworked
dpnp.uniqueimplementation through existing functions whenaxisis given otherwise through leveragingdpctl.tensorimplementation #1972 - Improved performance of
dpnp.linalg.svdby implementing a dedicated kernel for its batch implementation #1936 - Leveraged
dpctl.tensorimplementation forshape.settermethod #1975 - Extended
dpnp.ndarray.copyto support compute follow data keyword arguments #1976 - Reworked
dpnp.selectimplementation through existing functions instead of a separate kernel #1977 - Leveraged
dpctl.tensorimplementation fordpnp.from_dlpackanddpnp.ndarray.__dlpack__functions #1980 - Reworked
dpnp.linalgmodule backend implementation for BLAS rouitnes to work with OneMKL interfaces #1981 - Reworked
dpnp.ediff1dimplementation to reduce code duplication #1983 dpnpcan be used with any NumPy from 1.23 to 2.0 #1985- Reworked
dpnp.uniqueimplementation to properly handle NaNs values #1972 - Removed
dpnp.issubcdtypeper NumPy 2.0 recommendation #1996 - Reworked
dpnp.uniqueimplementation to align with NumPy 2.0 #1999 - Reworked
dpnp.linalg.solvebackend implementation to work with OneMKL Interfaces #2001 - Reworked
dpnp.trapezoidimplementation through existing functions instead of falling back on NumPy #2003 - Added
copykeyword todpnp.arrayto align with NumPy 2.0 #2006 - Extended
dpnp.heavisideto supportorderandoutkeyword arguments by writing dedicated kernel for it #2008 dpnpuses pybind11 2.13.5 #2010- Added
COMPILER_VERSION_2025_OR_LATERflag to be able to rundpnp.fftmodule with both 2024.2 and 2025.0 versions of the compiler #2025 - Cleaned up an implementation of
dpnp.gradientby removing obsolete TODO which is not going to be done #2032 - Updated
Array Manipulation Routinespage in documentation to add missing functions and to remove duplicate entries #2033 dpnpuses pybind11 2.13.6 #2041- Updated
dpnp.fftbackend to depend onINTEL_MKL_VERSIONflag to ensures that the appropriate code segment is executed based on the version of OneMKL #2035 - Use
dpctl::tensor::alloc_utils::sycl_free_noexceptinstead ofsycl::freeinhost_tasktasks associated with life-time management of temporary USM allocations #2058 - Improved implementation of
dpnp.kronto avoid unnecessary copy for non-contiguous arrays #2059 - Updated the test suit for
dpnp.fftmodule #2071 - Reworked
dpnp.clipimplementation to align with Python Array API 2023.12 specification #2048 - Skipped outdated tests for
dpnp.linalg.solvedue to compatibility issues with NumPy 2.0 #2074 - Updated installation instructions #2098
Fixed
- Resolved an issue with
dpnp.matmulwhen an f_contiguousoutkeyword is passed to the the function #1872 - Resolved a possible race condition in
dpnp.inv#1940 - Resolved an issue with failing tests for
dpnp.appendwhen running on a device without fp64 support #2034 - Resolved an issue with input array of
usm_ndarraypassed intodpnp.ix_#2047 - Added a workaround to prevent crash in tests on Windows in internal CI/CD (when running on either Lunar Lake or Arrow Lake) #2062
- Fixed a crash in
dpnp.choosecaused by missing control of releasing temporary allocated device memory #2063 - Resolved compilation warning and error while building in debug mode #2066
- Fixed an issue with asynchronous execution in
dpnp.fftmodule #2067
Full Changelog: 0.15.0...0.16.0