Skip to content

Migrates concurrent root LP to OpenMP#1291

Merged
rapids-bot[bot] merged 18 commits into
NVIDIA:release/26.06from
nguidotti:omp-migration-lp
May 25, 2026
Merged

Migrates concurrent root LP to OpenMP#1291
rapids-bot[bot] merged 18 commits into
NVIDIA:release/26.06from
nguidotti:omp-migration-lp

Conversation

@nguidotti
Copy link
Copy Markdown
Contributor

This PR cleans #1252, so only the changes related to the OpenMP are present.

Original description:

run_concurrent in cpp/src/pdlp/solve.cu now dispatches the barrier and dual simplex workers as #pragma omp task inside a #pragma omp taskgroup instead of raw std::thread. PDLP still runs synchronously on the dispatching thread.

MIP path (omp_in_parallel()): reuses the upstream solve_mip OMP team. Barrier and dual simplex now consume slots from the configured num_cpu_threads budget instead of spawning extra OS threads outside it.
Stand-alone LP path: stands up a local #pragma omp parallel + single with the right number of workers.

This PR also removes the confuscated std::future logic on the barrier on other PR.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

akifcorduk and others added 18 commits May 19, 2026 17:26
# Conflicts:
#	python/cuopt/cuopt/tests/linear_programming/test_incumbent_callbacks.py
The cuDSS warmup and eager barrier raft::handle_t construction on the
main thread were added to keep cuBLAS / cuSPARSE / cuSolverDn / cuDSS
first-use synchronous init from invalidating PDLP's CUDA graph capture.

manual_cuda_graph_t (from cuda_graph_side_capture) now recovers from
cudaErrorStreamCaptureInvalidated by re-running the captured work
eagerly, so the preflight is no longer needed. Move barrier handle
construction back inside the barrier OMP task body, matching the
pre-warmup pattern. Destruction stays at parent scope (post-taskgroup
join) so cublasDestroy → cudaDeviceSynchronize doesn't fire during a
capture.
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti nguidotti requested a review from a team as a code owner May 24, 2026 10:12
@nguidotti nguidotti requested review from rg20 and removed request for yuwenchen95 May 24, 2026 10:15
@nguidotti nguidotti self-assigned this May 24, 2026
@nguidotti nguidotti added non-breaking Introduces a non-breaking change improvement Improves an existing functionality mip P0 labels May 24, 2026
@nguidotti nguidotti added this to the 26.06 milestone May 24, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors the concurrent LP solver to use OpenMP taskgroups instead of std::thread spawning. A new compile-time constant gates barrier execution when running under MIP with insufficient thread availability. The barrier and dual simplex now dispatch as OpenMP tasks, with the barrier constructing its own CUDA context inside the task and results conditionally processed based on execution eligibility.

Changes

Concurrent LP solver refactoring

Layer / File(s) Summary
Thread-requirement constant for barrier gating
cpp/src/mip_heuristics/mip_constants.hpp
Defines CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT macro set to 3 to control whether the concurrent barrier is skipped when fewer threads are available in the MIP solver.
OpenMP header and utility includes
cpp/src/pdlp/solve.cu
Adds OpenMP support through omp_helpers.hpp and <omp.h> includes, and removes the prior <thread> dependency for concurrency.
Thread-aware barrier gating logic
cpp/src/pdlp/solve.cu
Computes available OpenMP threads and gates barrier execution: the barrier runs concurrently only when either the solve is not inside MIP or the available thread count meets the required threshold. Updates logging to reflect barrier eligibility.
OpenMP taskgroup refactoring of run_concurrent
cpp/src/pdlp/solve.cu
Replaces std::thread spawning with OpenMP taskgroup and tasks for barrier and dual simplex execution. Constructs barrier's raft::handle_t inside the task body, skips dual simplex task in MIP contexts, captures exceptions via std::exception_ptr, and requests concurrent halt on task failure.
Conditional barrier result handling
cpp/src/pdlp/solve.cu
Changes barrier-solution conversion to be conditional on enable_barrier: when the barrier task was skipped, sol_barrier is initialized with a sentinel ConcurrentLimit termination status instead of converting barrier outputs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

non-breaking, improvement

Suggested reviewers

  • tmckayus
  • rgsl888prabhu
  • hlinsen
  • chris-maes
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Migrates concurrent root LP to OpenMP' directly and clearly summarizes the main change: refactoring concurrent LP solving to use OpenMP instead of std::thread.
Description check ✅ Passed The description is directly related to the changeset, explaining the OpenMP migration in run_concurrent, thread management changes, and checklist items covering contributing guidelines and documentation updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/pdlp/solve.cu (1)

40-61: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit <atomic> include for std::atomic usage in cpp/src/pdlp/solve.cu

std::atomic<int> global_concurrent_halt{0}; is used at line 331, but this TU has no #include <atomic>, so it relies on transitive includes.

Suggested fix
 `#include` <algorithm>
+#include <atomic>
 `#include` <cmath>
 `#include` <exception>
 `#include` <set>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/solve.cu` around lines 40 - 61, This TU uses std::atomic<int>
via the symbol global_concurrent_halt but does not include <atomic>, relying on
transitive includes; add an explicit `#include` <atomic> near the other headers in
solve.cu so the declaration std::atomic<int> global_concurrent_halt{0}; and any
other atomic usages are correctly defined and portable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/mip_heuristics/mip_constants.hpp`:
- Around line 25-27: The compile-time gate
CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT is too high for the new task
topology in run_concurrent() (where dual simplex is no longer launched); change
the constant definition in mip_constants.hpp so it requires 2 threads (PDLP +
barrier) instead of 3, and add a short comment referencing run_concurrent() /
cpp/src/pdlp/solve.cu to explain the rationale.

---

Outside diff comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 40-61: This TU uses std::atomic<int> via the symbol
global_concurrent_halt but does not include <atomic>, relying on transitive
includes; add an explicit `#include` <atomic> near the other headers in solve.cu
so the declaration std::atomic<int> global_concurrent_halt{0}; and any other
atomic usages are correctly defined and portable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f8b0fe07-8fd9-4fc7-8c44-b2d9bf6a263f

📥 Commits

Reviewing files that changed from the base of the PR and between f3dd013 and 299d3d3.

📒 Files selected for processing (2)
  • cpp/src/mip_heuristics/mip_constants.hpp
  • cpp/src/pdlp/solve.cu

Comment on lines +25 to +27
// MIP-only gate: skip the concurrent barrier when fewer threads are available than this
// (1 PDLP + 1 dual simplex + 1 barrier). Stand-alone LP always runs all three.
#define CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT 3
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Lower this MIP barrier gate to match the new task topology.

In MIP mode run_concurrent() no longer launches dual simplex (cpp/src/pdlp/solve.cu, Lines 1615-1626), so this path only needs PDLP + barrier. Keeping the gate at 3 disables the barrier unnecessarily whenever exactly two threads are available.

Suggested fix
-// MIP-only gate: skip the concurrent barrier when fewer threads are available than this
-// (1 PDLP + 1 dual simplex + 1 barrier). Stand-alone LP always runs all three.
-#define CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT 3
+// MIP-only gate: skip the concurrent barrier when fewer threads are available than this
+// (1 PDLP + 1 barrier). Stand-alone LP always runs all three.
+#define CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT 2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mip_heuristics/mip_constants.hpp` around lines 25 - 27, The
compile-time gate CUOPT_CONCURRENT_LP_BARRIER_REQUIRED_THREAD_COUNT is too high
for the new task topology in run_concurrent() (where dual simplex is no longer
launched); change the constant definition in mip_constants.hpp so it requires 2
threads (PDLP + barrier) instead of 3, and add a short comment referencing
run_concurrent() / cpp/src/pdlp/solve.cu to explain the rationale.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dual simplex is launched in B&B, not here. So the thread count is correct

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nguidotti, thanks for the clarification! That makes sense — since B&B dispatches the dual simplex as its own OpenMP task that still occupies a slot in the shared thread pool, the full budget of 3 (PDLP + barrier + dual simplex from B&B) is indeed required. My analysis was wrong to treat the B&B-launched dual simplex as outside the thread count. The gate and the comment are correct as written.


🧠 Learnings used
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 798
File: cpp/src/utilities/timing_utils.hpp:25-29
Timestamp: 2026-01-26T14:18:09.111Z
Learning: In NVIDIA/cuopt, headers under cpp/src should not rely on MSVC-specific features. When reviewing cpp/src/**/*.hpp, verify there is no MSVC-only code paths (e.g., `#ifdef` _MSC_VER, MSVC inline assembly syntax) and prefer portable GCC/Clang-compatible implementations for Linux. If platform-specific code is needed, guard it with portable, clearly documented checks and ensure it compiles with GCC/Clang on Linux. Maintain portability by avoiding MSVC-only fallbacks unless explicitly required by a supported build configuration.

Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 986
File: cpp/src/branch_and_bound/branch_and_bound.cpp:8-8
Timestamp: 2026-03-23T11:33:23.998Z
Learning: In this repo (NVIDIA/cuopt), treat nvcc as the supported CUDA toolchain; clang-based compilation/support is not required and may fail/break. During code reviews, do NOT request code changes or add blocking comments for errors that appear only under clang (e.g., header-resolution failures such as 'utilities/determinism_log.hpp not found')—these can be toolchain-related rather than real source issues.

Learnt from: bdice
Repo: NVIDIA/cuopt PR: 1035
File: cpp/tests/utilities/base_fixture.hpp:29-47
Timestamp: 2026-04-19T16:49:17.616Z
Learning: In NVIDIA/cuopt (and the upstream rapidsai/rmm) after the RMM → CCCL migration, the rmm::mr resource adaptors (e.g., rmm::mr::pool_memory_resource and rmm::mr::binning_memory_resource) are now owning: they take/hold their upstream resources by value instead of non-owning references. Therefore, direct construction of adaptor chains from temporaries/local values (e.g., pool_memory_resource(make_async(), size)) is safe and should NOT be flagged as potential dangling/lifetime bugs. Also, rmm::mr::make_owning_wrapper is no longer needed/available for this owning design, so do not suggest it in this codepath.

Copy link
Copy Markdown
Contributor

@aliceb-nv aliceb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nicolas!

@aliceb-nv
Copy link
Copy Markdown
Contributor

/merge

@rapids-bot rapids-bot Bot merged commit 29b4dc1 into NVIDIA:release/26.06 May 25, 2026
102 checks passed
@nguidotti nguidotti deleted the omp-migration-lp branch May 25, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality mip non-breaking Introduces a non-breaking change P0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants