NEWS: Add v1.20.0 description to main branch #11014

roiedanino · 2025-11-19T12:54:39Z

What?

Add v1.20.0 new features and fixes to NEWS

Features

UCP

NEWS Line	Commit	Details
Added new GPU device API for direct GPU-to-GPU communication	ed7b07b36	UCT/API: Introduce device API
Added host API for GPU device management	db2c287b3	UCP/API: Add GPU device and host API
Added device signaling API with cooperation levels and flags	3c3d48977	UCP/API/DEVICE: Add signaling API
Added API for working with offsets and channel id in device operations	1537aa946	UCP/API: Work with offsets and channel id
Added method to write to local counter in device operations	5d40ef70a	UCP/DEVICE: add method to write to local counter
Added local and remote address fields to memory list element in device API	3bfebd3a4	UCP/DEVICE: Add local and remote address fields to memory list element
Added device lane selection and allocated handle population	481340f03	UCP/DEVICE: Add lane selection and populate allocated handle
Added support for Direct NIC (DPU) data path with CUDA	e7f59b68f	UCP/IB/CUDA: Direct NIC data path support
Added rkey packing support for Direct NIC	e816f536d	UCP/CORE: Rkey pack featuring Direct NIC
Added sender flush mechanism when memory sys_dev differs from remote lane sys_dev	6c5b307bd	UCP/RKEY: Sender flush if memory sys_dev is not remote lane sys_dev
Added option to use single network device per protocol	c73ee1a08	UCP/PROTO: Added option to use single network device
Added MIN_RMA_CHUNK_SIZE configuration parameter	0f379b8b9	UCP: Added MIN_RMA_CHUNK_SIZE
Decreased default value for MIN_RMA_CHUNK_SIZE from 16k to 8k	4b81bef86	UCP: Decrease default value for MIN_RMA_CHUNK_SIZE from 16k to 8k
Improved protocol lane selection with find_lanes callback to minimize overhead	82eaa01e7	UCP/PROTO: Find lanes callback to minimize overhead
Improved send-zcopy latency factor for fast-completion cases	ef3c1109d	UCP/PROTO: increase the latency factor of send-zcopy only in cases of fast-completion
Improved multi-ppn performance estimation	179a0e728	UCP: Fix multi ppn perf estimation
Removed deprecated ucp_mem functions	9ccbdf0a1	UCP/MM: Remove deprecated function
Deprecated ucp_request_alloc API	e93727f0f	UCP/API: Deprecate ucp_request_alloc

UCT

NEWS Line	Commit	Details
Added new device API for GPU communication (rc_gda transport)	ed7b07b36 + 9ded802a0	UCT/API: Introduce device API + UCT/GDA: Rename transport to rc_gda
Added GDAKI transport with endpoint export to GPU	8649b0449 + 34317a76c	UCT/GDAKI: Add endpoint + Export EP to GPU
Added DEVX QP/CQ support on foreign memory	5f3c75da0	UCT/MLX5: Add DEVX QP/CQ on foreign memory support
Added device API implementation for CUDA_IPC transport	035793caa	UCT/CUDA_IPC: implement device api
Added device put multi, put partial, and atomic operations for CUDA_IPC	ccfd7487b	UCT/CUDA_IPC: implement device put multi, put partial, atomic
Added peer failure error handling capability for GDAKI	f6b87cb94	UCT/GDAKI: Add peer failure error handling cap
Added check for nvidia_peermem driver when using GDA transport	48e9e475d	UCT/GDA: Check that nvidia peermem driver is loaded
Enabled Direct NIC by default for IB transport	d215d23de	UCT/IB: Enable Direct NIC by default
Added XDR performance recognition	19615ab98	UCT: Add XDR perf recognition
Added support for mapping DMA_BUF handle via PCIe for Direct NIC	2da859c64	UCT/CUDA/CUDA_COPY: Fixed mapping of DMA_BUF handle
Improved GDR_COPY performance with fast-path cache lookup	21d6cd664	UCT/GDR_COPY: Use fast-path cache lookup

RDMA CORE (IB, ROCE, etc.)

NEWS Line	Commit	Details
Added ConnectX-9 device support	ee882f615	UCT/IB: add CX9 to spec list
Split dp_ordering flag for DV/DevX transports	5959ce74b	UCT/IB: Split dp_ordering flag for DV/DevX
Added VRF tables support for RoCE reachability check	fe9b01261	UCT/IB: Support VRF tables for RoCE reachability check
Added EFA-specific GPUDirect support detection	4c707c0b0	UCT/IB: Check EFA-specific GPUDirect support

TCP

NEWS Line	Commit	Details
Added routing table check during reachability verification	6bae8c5e8	UCT/TCP: check the routing table during reachability check in TCP

UCS

NEWS Line	Commit	Details
Introduced lightweight rwlock data structure	8bbe776d3	UCS: Introduce lightweight rwlock
Added built-in atomics for rcache rwlock	4a8fa51cf	UCS/RCACHE: Use built-in atomics for rwlock
Improved VFS symlink paths and duplicate object handling	c1fa6ec15	UCS/VFS: Improve symlink paths and handling of duplicate objects

CUDA

NEWS Line	Commit	Details
Added wrappers for NVML functions	0e1d78825	UCT/CUDA: Added wrappers for nvml functions
Added hook for cuLibraryGetGlobal	2fb4921cf	UCM/CUDA: Added hook for cuLibraryGetGlobal
Improved CUDA call logging	2bae0a55f	UCT/CUDA: CUDA call logging enhancment
Improved source/destination memory type detection for lane performance estimation	2b014468d	UCP/PROTO/CUDA: Detect src/dst memtype when estimating lane perf
Removed unsafe usage of cuCtxGetId	8d4f43ebf	UCT/CUDA: Removed unsafe usage of cuCtxGetId
Added support for cuCtxCreate_v4 for newer CUDA versions	99430174d	TEST: use cuCtxCreate_v4 by default for newer CUDA versions
Improved context management for CUDA_IPC operations	168414505	UCT/CUDA/CUDA_IPC: Set context associated with local buffer

UCM

NEWS Line	Commit	Details
Changed module info print to debug level by default	7363001e4	UCM: Print module info to debug by default

Tools

NEWS Line	Commit	Details
Added GDAKI kernel option to perftest	df3d8b949	PERF: Perftest GDAKI kernel option
Added UCP cuda device tests to perftest	176cec5e7	UCP/PERF: UCP cuda device real tests
Added MPI+CUDA example	4d06b0e75	TEST/MPI: Added MPI+CUDA example
Differentiated wakeup feature and extra info options in perftest	c76fa1521	TOOL/PERFTEST: Differentiate wakeup feature and extra info options

Build

NEWS Line	Commit	Details
Added ability to build CUDA device code for supported architectures	502a12b94 + 63be7441e	BUILD: Add ability to build CUDA code + build device code for supported CUDA arch
Added ucx.spec into tarball for Universal Build System support	de69ee57f	BUILD: Add ucx.spec into the tarball to support Universal Build System
Added CUDA 13 support	29831d319	AZP/RELEASE: Add CUDA 13 support
Added GDA build failure when gpunetio not found	401c1278b	BUILD/GDA: Fail if building with gpunetio but it cannot be found

Packaging

NEWS Line	Commit	Details
Moved driver level dependencies under Recommends section in Debian packages	a49b1b3ac	DEBIAN: Moved driver level dependencies under Recommends section
Added Provides field for upstream packages in Debian	8b312fffa	DEBIAN: Add Provides field for upstream packages
Migrated JUCX publish from OSSRH to Central Portal	d77431a87	AZP/RELEASE: Migrate JUCX publish from OSSRH to Central Portal
Added ib-mlx5-gda separate package	9ded802a0	UCT/GDA: Rename transport to rc_gda and packages to ib-mlx5-gda

CI/Testing

NEWS Line	Commit	Details
Added Rocky OS support to release pipeline	189c08a1b	AZP/RELEASE: Add Rocky OS support
Added RHEL 10 containers to build matrices	c8d815949	AZP: Add RHEL 10 containers to build matrices
Added Debian 13 to CI build stage	934a42b1d	AZP: Add Debian13 to CI Build stage
Added ARM build testing	22a08b751	AZP: Add build test on ARM
Switched to MOFED 25.07	37be3d687	AZP: Switch to MOFED 25.07
Switched GPU tests to Ubuntu 24.04 DOCA 3.1 (GPUNetIO) image	575512691	AZP: Switch GPU tests to Ubuntu 24.04 DOCA 3.1 (GPUNetIO) image
Added support for nvidia_peermem module in testing	cfec29bec	TEST/CI: Support nvidia_peermem module
Disabled Valgrind in CI Tests stage	ae53ac724	CONTRIB: Disable Valgrind in CI Tests stage
Disabled tag matching offload tests	2dd6c6553	TEST/GTEST: Disable tag matching offload tests

GO Bindings

NEWS Line	Commit	Details
Made go bindings thread safe	ba5cc7e06	BINDINGS/GO: Bug fix - make gobindings thread safe

Documentation

NEWS Line	Commit	Details
Added note about reachability check mode in README	def7672e6	README: add note about reachability check mode
Mentioned nvlink as supported transport	e80c579ba	DOC: Mention nvlink as supported tl
Documented return status for device APIs	5fe5dafc9	DEVICE: Document return status for device APIs

AWS EFA

NEWS Line	Commit	Details
Added RMA WRITE operations support	7fe7e4482	UCT/IB/EFA: Add RMA WRITE operations
Added flush and fence operations for SRD	f2f43b4d0	UCT/EFA/SRD: Add flush and fence, enable for UCT
Enabled EFA SRD support in tests	a0ecb0165	GTEST/UCP: Enable EFA SRD support

Bugfixes

UCP

NEWS Line	Commit	Details
Fixed fallback to blocking registration for network device only	b14d7f1ef	UCP/CORE: Fallback to blocking registration for network device only
Fixed flush_state validity check before using it	8a7afc1fd	UCP/FLUSH: Check flush_state validity before using it
Fixed single net dev filtering for single proto	e75960cf0	UCP: Remove single net dev filtering for single proto
Fixed rkey size estimation for rendezvous	af42831b8	UCP/RNDV: Fix rkey size estimation
Fixed memory invalidation without RNDV	e52c71541 + 5d7962a44	UCP/WIREUP: Don't request invalidate without RNDV
Fixed gather_pending_requests to execute only when reconfig occurs	c5654c44b	UCP/WIREUP: Moved gather_pending_requests to be executed only when reconfig occurs

UCT

NEWS Line	Commit	Details
Fixed CUDA_IPC protocol selection for cuda_ipc	7e03b7820	DEVICE/CUDA_IPC: Fix proto selection for cuda_ipc
Fixed GDA compilation issues	2c3b56765	UCT/GDA: Fix compilation
Fixed GDAKI wqe_idx overflow	9ed46592e	UCT/GDAKI: Fix wqe_idx overflow
Fixed MM FIFO room calculation for tail > head case	c4b647fb2	UCT/MM: Fix the FIFO room calculation for tail > head
Fixed CUDA_IPC indices handling in put partial	9455002de	UCT/CUDA_IPC/TEST: change indices handling in put partial
Removed DOCA runtime dependency from GDAKI	655feb5ba	UCT/GDAKI: Remove DOCA runtime dependency
Fixed GDA log spam by reducing DOCA log level	913cca674	UCT/GDA/MLX5: Reduce DOCA log spam by setting ERROR level
Fixed UAR support check when querying resources for GDA/MLX5	f60a22c2e	UCT/GDA/MLX5: Check UAR is supported when querying resources
Fixed crash in GGA transport when EXPORTED_MKEY flag is missing	a5297d6fb	GGA: Remove XGVMI assertion

CUDA

NEWS Line	Commit	Details
Fixed stack overflow bug when calling cuPointerGetAttribute	beab36c50	UCT/CUDA_COPY: Fix stack overflow bug when calling cuPointerGetAttribute
Fixed mapping of DMA_BUF handle for Direct NIC	2da859c64	UCT/CUDA/CUDA_COPY: Fixed mapping of DMA_BUF handle
Returned object to mpool in case of failure in CUDA_COPY	ab3d32fd3	UCT/CUDA/CUDA_COPY: Return object to the mpool in case of failure
Reduced log level of rkey unpacking failures	90b7edcd5	UCT/CUDA/CUDA_IPC: Reduced log level of rkey unpacking failures
Handled cuMemRelease error status properly	53c314243	UCT/CUDA/CUDA_IPC: Handle cuMemRelease error status
Fixed context setting for local buffer in CUDA_IPC	168414505	UCT/CUDA/CUDA_IPC: Set context associated with local buffer
Fixed host unregister error message (changed to diagnostic)	7114352c5	UCT/CUDA: Do not print host unregister error (use diag)
Fixed CUDA_IPC header installation	e163bf03a	UCT/CUDA_IPC: install missing headers

RDMA CORE (IB, ROCE, etc.)

NEWS Line	Commit	Details
Fixed RoCE network device name reading	1c743cdcf	UCT/IB/BASE: Fix roce ndev name read
Fixed Direct NIC related issues	b12909197	UCT/IB: Minor fixes related to Direct NIC
Reverted RC EP address size adaptation without flush_rkey	362bbc491	UCT/IB/RC: Revert adapt EP address size without flush_rkey

UCS

NEWS Line	Commit	Details
Fixed ARCH header inclusion when building with nvcc (arm_neon.h)	49172d8f9	UCS/ARCH: Don't include arm_neon.h when building with nvcc
Fixed VFS symlink path handling	c1fa6ec15	UCS/VFS: Improve symlink paths and handling of duplicate objects
Fixed netlink message receiving to continue until 'done' flag is set	6d96bec1a	UCS/SYS/NETLINK: Receive netlink messages continuously until 'done' flag is set

Build

NEWS Line	Commit	Details
Fixed NVCC search with explicit --with-cuda	79a25dcb5	BUILD/CUDA: Fix NVCC search when --with-cuda passed explicitly
Fixed ZE transport build failures	b4d395bd3	UCT/ZE: fix ZE transport build failures
Fixed ucs_arch_get_cpu_flag compilation	4f1039d87	UCS/ARCH: Fix ucs_arch_get_cpu_flag compilation
Fixed CUDA device code build for supported architectures	63be7441e	BUILD/CUDA: build device code for supported CUDA arch

Testing

NEWS Line	Commit	Details
Fixed test_jenkins CI issues	ee04ccdc3	CI: Fix test_jenkins
Decreased rwlock test duration	cf360d50e	TEST/RWLOCK: Decrease test duration
Fixed error counting in gtest	ac3773ac1	TEST/GTEST: Fix error counting in test
Enabled retries for test_arch.memcpy	b9b953567	TEST/ARCH: Enable retries for test_arch.memcpy
Fixed test_cuda_nvml condition relaxation	35459fe57	GTEST/UCT: Relaxed test_cuda_nvml.device_get_fabric_info condition
Skipped build when generating packages	a22a3e1e5	TEST/APPS: Skip build when generating packages
Fixed CUDA device restoration in tests	935ad58f1	TEST/CUDA: Restore original cuda device
Improved error detection in UCP device tests	f3366d629	TEST/UCP/DEVICE: Improve error detection
Fixed global topo state cleanup during gtest	61e00bb46	UCS/TOPO/TEST: Don't clean the global topo state during gtest

Tools

NEWS Line	Commit	Details
Fixed perftest CUDA kernel issues	d3ff21100	TOOLS/PERF: Perftest cuda kernel fixes

GO Bindings

NEWS Line	Commit	Details
Fixed go bindings compilation with CUDA	c9b7f6e53	BINDINGS/GO: Fixed go bindings compilation with CUDA

IB/EFA

NEWS Line	Commit	Details
Fixed error message when FLID is not available	3c7b2dd7c	UCP/WIREUP/IB: Fix error message when FLID is not available

Packaging

NEWS Line	Commit	Details
Fixed RPM SPEC debug_package macro execution on SLES16	de444fabe	RPM/SPEC: Do not execute %debug_package macro on SLES16

Signed-off-by: Roie Danino <[email protected]>

coderabbitai · 2025-11-19T12:54:45Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

NEWS: Add v1.20.0 description to main branch

e59a332

Signed-off-by: Roie Danino <[email protected]>

roiedanino requested a review from gleon99 November 19, 2025 12:54

roiedanino self-assigned this Nov 19, 2025

roiedanino added the Ready for Review label Nov 19, 2025

roiedanino requested review from ofirfarjun7 and yosefe November 19, 2025 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NEWS: Add v1.20.0 description to main branch #11014

NEWS: Add v1.20.0 description to main branch #11014

Uh oh!

roiedanino commented Nov 19, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 19, 2025

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NEWS: Add v1.20.0 description to main branch #11014

Are you sure you want to change the base?

NEWS: Add v1.20.0 description to main branch #11014

Uh oh!

Conversation

roiedanino commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Features

UCP

UCT

RDMA CORE (IB, ROCE, etc.)

TCP

UCS

CUDA

UCM

Tools

Build

Packaging

CI/Testing

GO Bindings

Documentation

AWS EFA

Bugfixes

UCP

UCT

CUDA

RDMA CORE (IB, ROCE, etc.)

UCS

Build

Testing

Tools

GO Bindings

IB/EFA

Packaging

Uh oh!

coderabbitai bot commented Nov 19, 2025

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

roiedanino commented Nov 19, 2025 •

edited

Loading