Migrates concurrent root LP to OpenMP by nguidotti · Pull Request #1291 · NVIDIA/cuopt

nguidotti · 2026-05-24T10:12:35Z

This PR cleans #1252, so only the changes related to the OpenMP are present.

Original description:

run_concurrent in cpp/src/pdlp/solve.cu now dispatches the barrier and dual simplex workers as #pragma omp task inside a #pragma omp taskgroup instead of raw std::thread. PDLP still runs synchronously on the dispatching thread.

MIP path (omp_in_parallel()): reuses the upstream solve_mip OMP team. Barrier and dual simplex now consume slots from the configured num_cpu_threads budget instead of spawning extra OS threads outside it.
Stand-alone LP path: stands up a local #pragma omp parallel + single with the right number of workers.

This PR also removes the confuscated std::future logic on the barrier on other PR.

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

…apture

# Conflicts: # python/cuopt/cuopt/tests/linear_programming/test_incumbent_callbacks.py

The cuDSS warmup and eager barrier raft::handle_t construction on the main thread were added to keep cuBLAS / cuSPARSE / cuSolverDn / cuDSS first-use synchronous init from invalidating PDLP's CUDA graph capture. manual_cuda_graph_t (from cuda_graph_side_capture) now recovers from cudaErrorStreamCaptureInvalidated by re-running the captured work eagerly, so the preflight is no longer needed. Move barrier handle construction back inside the barrier OMP task body, matching the pre-warmup pattern. Destruction stays at parent scope (post-taskgroup join) so cublasDestroy → cudaDeviceSynchronize doesn't fire during a capture.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

coderabbitai · 2026-05-24T10:18:43Z

📝 Walkthrough

Walkthrough

This PR refactors the concurrent LP solver to use OpenMP taskgroups instead of std::thread spawning. A new compile-time constant gates barrier execution when running under MIP with insufficient thread availability. The barrier and dual simplex now dispatch as OpenMP tasks, with the barrier constructing its own CUDA context inside the task and results conditionally processed based on execution eligibility.

Changes

Concurrent LP solver refactoring

Layer / File(s)	Summary
Thread-requirement constant for barrier gating `cpp/src/mip_heuristics/mip_constants.hpp`	Defines `CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT` macro set to `3` to control whether the concurrent barrier is skipped when fewer threads are available in the MIP solver.
OpenMP header and utility includes `cpp/src/pdlp/solve.cu`	Adds OpenMP support through `omp_helpers.hpp` and `<omp.h>` includes, and removes the prior `<thread>` dependency for concurrency.
Thread-aware barrier gating logic `cpp/src/pdlp/solve.cu`	Computes available OpenMP threads and gates barrier execution: the barrier runs concurrently only when either the solve is not inside MIP or the available thread count meets the required threshold. Updates logging to reflect barrier eligibility.
OpenMP taskgroup refactoring of run_concurrent `cpp/src/pdlp/solve.cu`	Replaces std::thread spawning with OpenMP `taskgroup` and tasks for barrier and dual simplex execution. Constructs barrier's `raft::handle_t` inside the task body, skips dual simplex task in MIP contexts, captures exceptions via `std::exception_ptr`, and requests concurrent halt on task failure.
Conditional barrier result handling `cpp/src/pdlp/solve.cu`	Changes barrier-solution conversion to be conditional on `enable_barrier`: when the barrier task was skipped, `sol_barrier` is initialized with a sentinel `ConcurrentLimit` termination status instead of converting barrier outputs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

non-breaking, improvement

Suggested reviewers

tmckayus
rgsl888prabhu
hlinsen
chris-maes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Migrates concurrent root LP to OpenMP' directly and clearly summarizes the main change: refactoring concurrent LP solving to use OpenMP instead of std::thread.
Description check	✅ Passed	The description is directly related to the changeset, explaining the OpenMP migration in run_concurrent, thread management changes, and checklist items covering contributing guidelines and documentation updates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/pdlp/solve.cu (1)
40-61: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit <atomic> include for std::atomic usage in cpp/src/pdlp/solve.cu

std::atomic<int> global_concurrent_halt{0}; is used at line 331, but this TU has no #include <atomic>, so it relies on transitive includes.
Suggested fix
 `#include` <algorithm>
+#include <atomic>
 `#include` <cmath>
 `#include` <exception>
 `#include` <set>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/solve.cu` around lines 40 - 61, This TU uses std::atomic<int>
via the symbol global_concurrent_halt but does not include <atomic>, relying on
transitive includes; add an explicit `#include` <atomic> near the other headers in
solve.cu so the declaration std::atomic<int> global_concurrent_halt{0}; and any
other atomic usages are correctly defined and portable.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/mip_heuristics/mip_constants.hpp`:
- Around line 25-27: The compile-time gate
CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT is too high for the new task
topology in run_concurrent() (where dual simplex is no longer launched); change
the constant definition in mip_constants.hpp so it requires 2 threads (PDLP +
barrier) instead of 3, and add a short comment referencing run_concurrent() /
cpp/src/pdlp/solve.cu to explain the rationale.

---

Outside diff comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 40-61: This TU uses std::atomic<int> via the symbol
global_concurrent_halt but does not include <atomic>, relying on transitive
includes; add an explicit `#include` <atomic> near the other headers in solve.cu
so the declaration std::atomic<int> global_concurrent_halt{0}; and any other
atomic usages are correctly defined and portable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f8b0fe07-8fd9-4fc7-8c44-b2d9bf6a263f

📥 Commits

Reviewing files that changed from the base of the PR and between f3dd013 and 299d3d3.

📒 Files selected for processing (2)

cpp/src/mip_heuristics/mip_constants.hpp
cpp/src/pdlp/solve.cu

coderabbitai · 2026-05-24T10:18:46Z

+// MIP-only gate: skip the concurrent barrier when fewer threads are available than this
+// (1 PDLP + 1 dual simplex + 1 barrier). Stand-alone LP always runs all three.
+#define CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT 3


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Lower this MIP barrier gate to match the new task topology.

In MIP mode run_concurrent() no longer launches dual simplex (cpp/src/pdlp/solve.cu, Lines 1615-1626), so this path only needs PDLP + barrier. Keeping the gate at 3 disables the barrier unnecessarily whenever exactly two threads are available.

Suggested fix

-// MIP-only gate: skip the concurrent barrier when fewer threads are available than this -// (1 PDLP + 1 dual simplex + 1 barrier). Stand-alone LP always runs all three. -#define CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT 3 +// MIP-only gate: skip the concurrent barrier when fewer threads are available than this +// (1 PDLP + 1 barrier). Stand-alone LP always runs all three. +#define CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT 2

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/mip_heuristics/mip_constants.hpp` around lines 25 - 27, The compile-time gate CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT is too high for the new task topology in run_concurrent() (where dual simplex is no longer launched); change the constant definition in mip_constants.hpp so it requires 2 threads (PDLP + barrier) instead of 3, and add a short comment referencing run_concurrent() / cpp/src/pdlp/solve.cu to explain the rationale.

The dual simplex is launched in B&B, not here. So the thread count is correct

@nguidotti, thanks for the clarification! That makes sense — since B&B dispatches the dual simplex as its own OpenMP task that still occupies a slot in the shared thread pool, the full budget of 3 (PDLP + barrier + dual simplex from B&B) is indeed required. My analysis was wrong to treat the B&B-launched dual simplex as outside the thread count. The gate and the comment are correct as written.

🧠 Learnings used

Learnt from: aliceb-nv Repo: NVIDIA/cuopt PR: 798 File: cpp/src/utilities/timing_utils.hpp:25-29 Timestamp: 2026-01-26T14:18:09.111Z Learning: In NVIDIA/cuopt, headers under cpp/src should not rely on MSVC-specific features. When reviewing cpp/src/**/*.hpp, verify there is no MSVC-only code paths (e.g., `#ifdef` _MSC_VER, MSVC inline assembly syntax) and prefer portable GCC/Clang-compatible implementations for Linux. If platform-specific code is needed, guard it with portable, clearly documented checks and ensure it compiles with GCC/Clang on Linux. Maintain portability by avoiding MSVC-only fallbacks unless explicitly required by a supported build configuration.

Learnt from: aliceb-nv Repo: NVIDIA/cuopt PR: 986 File: cpp/src/branch_and_bound/branch_and_bound.cpp:8-8 Timestamp: 2026-03-23T11:33:23.998Z Learning: In this repo (NVIDIA/cuopt), treat nvcc as the supported CUDA toolchain; clang-based compilation/support is not required and may fail/break. During code reviews, do NOT request code changes or add blocking comments for errors that appear only under clang (e.g., header-resolution failures such as 'utilities/determinism_log.hpp not found')—these can be toolchain-related rather than real source issues.

Learnt from: bdice Repo: NVIDIA/cuopt PR: 1035 File: cpp/tests/utilities/base_fixture.hpp:29-47 Timestamp: 2026-04-19T16:49:17.616Z Learning: In NVIDIA/cuopt (and the upstream rapidsai/rmm) after the RMM → CCCL migration, the rmm::mr resource adaptors (e.g., rmm::mr::pool_memory_resource and rmm::mr::binning_memory_resource) are now owning: they take/hold their upstream resources by value instead of non-owning references. Therefore, direct construction of adaptor chains from temporaries/local values (e.g., pool_memory_resource(make_async(), size)) is safe and should NOT be flagged as potential dangling/lifetime bugs. Also, rmm::mr::make_owning_wrapper is no longer needed/available for this owning design, so do not suggest it in this codepath.

aliceb-nv

Thanks Nicolas!

aliceb-nv · 2026-05-25T09:02:10Z

/merge

akifcorduk and others added 18 commits May 19, 2026 17:26

fix some issues

eb2cb94

Merge branch 'main' of github.com:NVIDIA/cuopt into test_main_fix

ecef670

reenable incumbent callback tests

fc92119

move concurrent solve to omp and obey thread count

a74e830

reduce comments

d5b7140

add cuda error recovery for capture

8ad61dd

test CI

0a5149b

fix logger

06352db

Merge branch 'main' of github.com:NVIDIA/cuopt into cuda_graph_side_c…

36e74d2

…apture

restore the api and use api suitable for <12.3

4d2fb18

more comments

dba39c8

Merge branch 'cuda_graph_side_capture' into omp_migrate_concurrent_root

a307d7b

# Conflicts: # python/cuopt/cuopt/tests/linear_programming/test_incumbent_callbacks.py

convert std::atomic to omp_atomic_

e7a462b

fix ping pong graph major, non-major logic

56b6e84

Merge branch 'cuda_graph_side_capture' into omp_migrate_concurrent_root

5882409

migrate concurrent LP mode to OpenMP

71a2881

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

recover original comment

299d3d3

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>

nguidotti requested a review from a team as a code owner May 24, 2026 10:12

nguidotti requested review from chris-maes and yuwenchen95 May 24, 2026 10:12

nguidotti mentioned this pull request May 24, 2026

Migrate concurrent root LP to OMP #1252

Closed

nguidotti requested review from rg20 and removed request for yuwenchen95 May 24, 2026 10:15

nguidotti self-assigned this May 24, 2026

nguidotti added non-breaking Introduces a non-breaking change improvement Improves an existing functionality mip P0 labels May 24, 2026

nguidotti added this to the 26.06 milestone May 24, 2026

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

aliceb-nv approved these changes May 25, 2026

View reviewed changes

rapids-bot Bot merged commit 29b4dc1 into NVIDIA:release/26.06 May 25, 2026
102 checks passed

nguidotti deleted the omp-migration-lp branch May 25, 2026 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrates concurrent root LP to OpenMP#1291

Migrates concurrent root LP to OpenMP#1291
rapids-bot[bot] merged 18 commits into
NVIDIA:release/26.06from
nguidotti:omp-migration-lp

nguidotti commented May 24, 2026

Uh oh!

coderabbitai Bot commented May 24, 2026

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 24, 2026 •

edited

Loading

Uh oh!

nguidotti May 24, 2026

Uh oh!

coderabbitai Bot May 24, 2026

Uh oh!

aliceb-nv left a comment

Uh oh!

aliceb-nv commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nguidotti commented May 24, 2026

Original description:

Checklist

Uh oh!

coderabbitai Bot commented May 24, 2026

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nguidotti May 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

aliceb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

aliceb-nv commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot May 24, 2026 •

edited

Loading