Skip to content

Conversation

@roiedanino
Copy link
Contributor

@roiedanino roiedanino commented Nov 19, 2025

What?

Add v1.20.0 new features and fixes to NEWS


Features

UCP

NEWS Line Commit Details
Added new GPU device API for direct GPU-to-GPU communication ed7b07b36 UCT/API: Introduce device API
Added host API for GPU device management db2c287b3 UCP/API: Add GPU device and host API
Added device signaling API with cooperation levels and flags 3c3d48977 UCP/API/DEVICE: Add signaling API
Added API for working with offsets and channel id in device operations 1537aa946 UCP/API: Work with offsets and channel id
Added method to write to local counter in device operations 5d40ef70a UCP/DEVICE: add method to write to local counter
Added local and remote address fields to memory list element in device API 3bfebd3a4 UCP/DEVICE: Add local and remote address fields to memory list element
Added device lane selection and allocated handle population 481340f03 UCP/DEVICE: Add lane selection and populate allocated handle
Added support for Direct NIC (DPU) data path with CUDA e7f59b68f UCP/IB/CUDA: Direct NIC data path support
Added rkey packing support for Direct NIC e816f536d UCP/CORE: Rkey pack featuring Direct NIC
Added sender flush mechanism when memory sys_dev differs from remote lane sys_dev 6c5b307bd UCP/RKEY: Sender flush if memory sys_dev is not remote lane sys_dev
Added option to use single network device per protocol c73ee1a08 UCP/PROTO: Added option to use single network device
Added MIN_RMA_CHUNK_SIZE configuration parameter 0f379b8b9 UCP: Added MIN_RMA_CHUNK_SIZE
Decreased default value for MIN_RMA_CHUNK_SIZE from 16k to 8k 4b81bef86 UCP: Decrease default value for MIN_RMA_CHUNK_SIZE from 16k to 8k
Improved protocol lane selection with find_lanes callback to minimize overhead 82eaa01e7 UCP/PROTO: Find lanes callback to minimize overhead
Improved send-zcopy latency factor for fast-completion cases ef3c1109d UCP/PROTO: increase the latency factor of send-zcopy only in cases of fast-completion
Improved multi-ppn performance estimation 179a0e728 UCP: Fix multi ppn perf estimation
Removed deprecated ucp_mem functions 9ccbdf0a1 UCP/MM: Remove deprecated function
Deprecated ucp_request_alloc API e93727f0f UCP/API: Deprecate ucp_request_alloc

UCT

NEWS Line Commit Details
Added new device API for GPU communication (rc_gda transport) ed7b07b36 + 9ded802a0 UCT/API: Introduce device API + UCT/GDA: Rename transport to rc_gda
Added GDAKI transport with endpoint export to GPU 8649b0449 + 34317a76c UCT/GDAKI: Add endpoint + Export EP to GPU
Added DEVX QP/CQ support on foreign memory 5f3c75da0 UCT/MLX5: Add DEVX QP/CQ on foreign memory support
Added device API implementation for CUDA_IPC transport 035793caa UCT/CUDA_IPC: implement device api
Added device put multi, put partial, and atomic operations for CUDA_IPC ccfd7487b UCT/CUDA_IPC: implement device put multi, put partial, atomic
Added peer failure error handling capability for GDAKI f6b87cb94 UCT/GDAKI: Add peer failure error handling cap
Added check for nvidia_peermem driver when using GDA transport 48e9e475d UCT/GDA: Check that nvidia peermem driver is loaded
Enabled Direct NIC by default for IB transport d215d23de UCT/IB: Enable Direct NIC by default
Added XDR performance recognition 19615ab98 UCT: Add XDR perf recognition
Added support for mapping DMA_BUF handle via PCIe for Direct NIC 2da859c64 UCT/CUDA/CUDA_COPY: Fixed mapping of DMA_BUF handle
Improved GDR_COPY performance with fast-path cache lookup 21d6cd664 UCT/GDR_COPY: Use fast-path cache lookup

RDMA CORE (IB, ROCE, etc.)

NEWS Line Commit Details
Added ConnectX-9 device support ee882f615 UCT/IB: add CX9 to spec list
Split dp_ordering flag for DV/DevX transports 5959ce74b UCT/IB: Split dp_ordering flag for DV/DevX
Added VRF tables support for RoCE reachability check fe9b01261 UCT/IB: Support VRF tables for RoCE reachability check
Added EFA-specific GPUDirect support detection 4c707c0b0 UCT/IB: Check EFA-specific GPUDirect support

TCP

NEWS Line Commit Details
Added routing table check during reachability verification 6bae8c5e8 UCT/TCP: check the routing table during reachability check in TCP

UCS

NEWS Line Commit Details
Introduced lightweight rwlock data structure 8bbe776d3 UCS: Introduce lightweight rwlock
Added built-in atomics for rcache rwlock 4a8fa51cf UCS/RCACHE: Use built-in atomics for rwlock
Improved VFS symlink paths and duplicate object handling c1fa6ec15 UCS/VFS: Improve symlink paths and handling of duplicate objects

CUDA

NEWS Line Commit Details
Added wrappers for NVML functions 0e1d78825 UCT/CUDA: Added wrappers for nvml functions
Added hook for cuLibraryGetGlobal 2fb4921cf UCM/CUDA: Added hook for cuLibraryGetGlobal
Improved CUDA call logging 2bae0a55f UCT/CUDA: CUDA call logging enhancment
Improved source/destination memory type detection for lane performance estimation 2b014468d UCP/PROTO/CUDA: Detect src/dst memtype when estimating lane perf
Removed unsafe usage of cuCtxGetId 8d4f43ebf UCT/CUDA: Removed unsafe usage of cuCtxGetId
Added support for cuCtxCreate_v4 for newer CUDA versions 99430174d TEST: use cuCtxCreate_v4 by default for newer CUDA versions
Improved context management for CUDA_IPC operations 168414505 UCT/CUDA/CUDA_IPC: Set context associated with local buffer

UCM

NEWS Line Commit Details
Changed module info print to debug level by default 7363001e4 UCM: Print module info to debug by default

Tools

NEWS Line Commit Details
Added GDAKI kernel option to perftest df3d8b949 PERF: Perftest GDAKI kernel option
Added UCP cuda device tests to perftest 176cec5e7 UCP/PERF: UCP cuda device real tests
Added MPI+CUDA example 4d06b0e75 TEST/MPI: Added MPI+CUDA example
Differentiated wakeup feature and extra info options in perftest c76fa1521 TOOL/PERFTEST: Differentiate wakeup feature and extra info options

Build

NEWS Line Commit Details
Added ability to build CUDA device code for supported architectures 502a12b94 + 63be7441e BUILD: Add ability to build CUDA code + build device code for supported CUDA arch
Added ucx.spec into tarball for Universal Build System support de69ee57f BUILD: Add ucx.spec into the tarball to support Universal Build System
Added CUDA 13 support 29831d319 AZP/RELEASE: Add CUDA 13 support
Added GDA build failure when gpunetio not found 401c1278b BUILD/GDA: Fail if building with gpunetio but it cannot be found

Packaging

NEWS Line Commit Details
Moved driver level dependencies under Recommends section in Debian packages a49b1b3ac DEBIAN: Moved driver level dependencies under Recommends section
Added Provides field for upstream packages in Debian 8b312fffa DEBIAN: Add Provides field for upstream packages
Migrated JUCX publish from OSSRH to Central Portal d77431a87 AZP/RELEASE: Migrate JUCX publish from OSSRH to Central Portal
Added ib-mlx5-gda separate package 9ded802a0 UCT/GDA: Rename transport to rc_gda and packages to ib-mlx5-gda

CI/Testing

NEWS Line Commit Details
Added Rocky OS support to release pipeline 189c08a1b AZP/RELEASE: Add Rocky OS support
Added RHEL 10 containers to build matrices c8d815949 AZP: Add RHEL 10 containers to build matrices
Added Debian 13 to CI build stage 934a42b1d AZP: Add Debian13 to CI Build stage
Added ARM build testing 22a08b751 AZP: Add build test on ARM
Switched to MOFED 25.07 37be3d687 AZP: Switch to MOFED 25.07
Switched GPU tests to Ubuntu 24.04 DOCA 3.1 (GPUNetIO) image 575512691 AZP: Switch GPU tests to Ubuntu 24.04 DOCA 3.1 (GPUNetIO) image
Added support for nvidia_peermem module in testing cfec29bec TEST/CI: Support nvidia_peermem module
Disabled Valgrind in CI Tests stage ae53ac724 CONTRIB: Disable Valgrind in CI Tests stage
Disabled tag matching offload tests 2dd6c6553 TEST/GTEST: Disable tag matching offload tests

GO Bindings

NEWS Line Commit Details
Made go bindings thread safe ba5cc7e06 BINDINGS/GO: Bug fix - make gobindings thread safe

Documentation

NEWS Line Commit Details
Added note about reachability check mode in README def7672e6 README: add note about reachability check mode
Mentioned nvlink as supported transport e80c579ba DOC: Mention nvlink as supported tl
Documented return status for device APIs 5fe5dafc9 DEVICE: Document return status for device APIs

AWS EFA

NEWS Line Commit Details
Added RMA WRITE operations support 7fe7e4482 UCT/IB/EFA: Add RMA WRITE operations
Added flush and fence operations for SRD f2f43b4d0 UCT/EFA/SRD: Add flush and fence, enable for UCT
Enabled EFA SRD support in tests a0ecb0165 GTEST/UCP: Enable EFA SRD support

Bugfixes

UCP

NEWS Line Commit Details
Fixed fallback to blocking registration for network device only b14d7f1ef UCP/CORE: Fallback to blocking registration for network device only
Fixed flush_state validity check before using it 8a7afc1fd UCP/FLUSH: Check flush_state validity before using it
Fixed single net dev filtering for single proto e75960cf0 UCP: Remove single net dev filtering for single proto
Fixed rkey size estimation for rendezvous af42831b8 UCP/RNDV: Fix rkey size estimation
Fixed memory invalidation without RNDV e52c71541 + 5d7962a44 UCP/WIREUP: Don't request invalidate without RNDV
Fixed gather_pending_requests to execute only when reconfig occurs c5654c44b UCP/WIREUP: Moved gather_pending_requests to be executed only when reconfig occurs

UCT

NEWS Line Commit Details
Fixed CUDA_IPC protocol selection for cuda_ipc 7e03b7820 DEVICE/CUDA_IPC: Fix proto selection for cuda_ipc
Fixed GDA compilation issues 2c3b56765 UCT/GDA: Fix compilation
Fixed GDAKI wqe_idx overflow 9ed46592e UCT/GDAKI: Fix wqe_idx overflow
Fixed MM FIFO room calculation for tail > head case c4b647fb2 UCT/MM: Fix the FIFO room calculation for tail > head
Fixed CUDA_IPC indices handling in put partial 9455002de UCT/CUDA_IPC/TEST: change indices handling in put partial
Removed DOCA runtime dependency from GDAKI 655feb5ba UCT/GDAKI: Remove DOCA runtime dependency
Fixed GDA log spam by reducing DOCA log level 913cca674 UCT/GDA/MLX5: Reduce DOCA log spam by setting ERROR level
Fixed UAR support check when querying resources for GDA/MLX5 f60a22c2e UCT/GDA/MLX5: Check UAR is supported when querying resources
Fixed crash in GGA transport when EXPORTED_MKEY flag is missing a5297d6fb GGA: Remove XGVMI assertion

CUDA

NEWS Line Commit Details
Fixed stack overflow bug when calling cuPointerGetAttribute beab36c50 UCT/CUDA_COPY: Fix stack overflow bug when calling cuPointerGetAttribute
Fixed mapping of DMA_BUF handle for Direct NIC 2da859c64 UCT/CUDA/CUDA_COPY: Fixed mapping of DMA_BUF handle
Returned object to mpool in case of failure in CUDA_COPY ab3d32fd3 UCT/CUDA/CUDA_COPY: Return object to the mpool in case of failure
Reduced log level of rkey unpacking failures 90b7edcd5 UCT/CUDA/CUDA_IPC: Reduced log level of rkey unpacking failures
Handled cuMemRelease error status properly 53c314243 UCT/CUDA/CUDA_IPC: Handle cuMemRelease error status
Fixed context setting for local buffer in CUDA_IPC 168414505 UCT/CUDA/CUDA_IPC: Set context associated with local buffer
Fixed host unregister error message (changed to diagnostic) 7114352c5 UCT/CUDA: Do not print host unregister error (use diag)
Fixed CUDA_IPC header installation e163bf03a UCT/CUDA_IPC: install missing headers

RDMA CORE (IB, ROCE, etc.)

NEWS Line Commit Details
Fixed RoCE network device name reading 1c743cdcf UCT/IB/BASE: Fix roce ndev name read
Fixed Direct NIC related issues b12909197 UCT/IB: Minor fixes related to Direct NIC
Reverted RC EP address size adaptation without flush_rkey 362bbc491 UCT/IB/RC: Revert adapt EP address size without flush_rkey

UCS

NEWS Line Commit Details
Fixed ARCH header inclusion when building with nvcc (arm_neon.h) 49172d8f9 UCS/ARCH: Don't include arm_neon.h when building with nvcc
Fixed VFS symlink path handling c1fa6ec15 UCS/VFS: Improve symlink paths and handling of duplicate objects
Fixed netlink message receiving to continue until 'done' flag is set 6d96bec1a UCS/SYS/NETLINK: Receive netlink messages continuously until 'done' flag is set

Build

NEWS Line Commit Details
Fixed NVCC search with explicit --with-cuda 79a25dcb5 BUILD/CUDA: Fix NVCC search when --with-cuda passed explicitly
Fixed ZE transport build failures b4d395bd3 UCT/ZE: fix ZE transport build failures
Fixed ucs_arch_get_cpu_flag compilation 4f1039d87 UCS/ARCH: Fix ucs_arch_get_cpu_flag compilation
Fixed CUDA device code build for supported architectures 63be7441e BUILD/CUDA: build device code for supported CUDA arch

Testing

NEWS Line Commit Details
Fixed test_jenkins CI issues ee04ccdc3 CI: Fix test_jenkins
Decreased rwlock test duration cf360d50e TEST/RWLOCK: Decrease test duration
Fixed error counting in gtest ac3773ac1 TEST/GTEST: Fix error counting in test
Enabled retries for test_arch.memcpy b9b953567 TEST/ARCH: Enable retries for test_arch.memcpy
Fixed test_cuda_nvml condition relaxation 35459fe57 GTEST/UCT: Relaxed test_cuda_nvml.device_get_fabric_info condition
Skipped build when generating packages a22a3e1e5 TEST/APPS: Skip build when generating packages
Fixed CUDA device restoration in tests 935ad58f1 TEST/CUDA: Restore original cuda device
Improved error detection in UCP device tests f3366d629 TEST/UCP/DEVICE: Improve error detection
Fixed global topo state cleanup during gtest 61e00bb46 UCS/TOPO/TEST: Don't clean the global topo state during gtest

Tools

NEWS Line Commit Details
Fixed perftest CUDA kernel issues d3ff21100 TOOLS/PERF: Perftest cuda kernel fixes

GO Bindings

NEWS Line Commit Details
Fixed go bindings compilation with CUDA c9b7f6e53 BINDINGS/GO: Fixed go bindings compilation with CUDA

IB/EFA

NEWS Line Commit Details
Fixed error message when FLID is not available 3c7b2dd7c UCP/WIREUP/IB: Fix error message when FLID is not available

Packaging

NEWS Line Commit Details
Fixed RPM SPEC debug_package macro execution on SLES16 de444fabe RPM/SPEC: Do not execute %debug_package macro on SLES16

@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant