Skip to content

feat: Allow configuring leader election namespace#1333

Open
ratschance wants to merge 2 commits intoNVIDIA:mainfrom
ratschance:allow-overriding-lease-namespace
Open

feat: Allow configuring leader election namespace#1333
ratschance wants to merge 2 commits intoNVIDIA:mainfrom
ratschance:allow-overriding-lease-namespace

Conversation

@ratschance
Copy link
Copy Markdown

Add CLI flag to allow configuring the LeaderElectionNamespace on the controller-runtime's manager. This namespace is where the operator places its Lease object, and when unset, defaults to the namespace that the operator is running in. This new CLI flag defaults to an empty string to preserve that original behavior.

Add CLI flag to allow configuring the LeaderElectionNamespace on the
controller-runtime's manager. This namespace is where the operator
places its Lease object, and when unset, defaults to the namespace that
the operator is running in. This new CLI flag defaults to an empty string to
preserve that original behavior.

Signed-off-by: Conrad Ratschan <cratschan@coreweave.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@elezar elezar removed their request for review October 15, 2025 14:19
@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2026
@rahulait rahulait removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2026
@ratschance
Copy link
Copy Markdown
Author

👋 anything I can do (or am missing) to help gain traction on this one?

@rajathagasthya
Copy link
Copy Markdown
Contributor

@ratschance Do you anticipate creating a lease in a namespace other than the operator's namespace? It's not configurable right now because we want the Lease object to be in the same namespace. Since leader election is also not configurable via helm chart, I'm not entirely sure this change is necessary.

@ratschance
Copy link
Copy Markdown
Author

@rajathagasthya Yes, we currently run the operator in a separate management cluster and give it a kubeconfig for the cluster it operates on. This leads to failures if we do not create a namespace in the tenant cluster matching the management cluster namespace that is hosting the gpu operator pod.

Not using the helm chart this project provides due to the nature of this configuration.

@rajathagasthya
Copy link
Copy Markdown
Contributor

@ratschance Thanks for sharing context on your setup. I'm curious — in your split-cluster deployment, how are you handling the OPERATOR_NAMESPACE env var? The operator uses it as the namespace for all operand DaemonSets. Even with this PR for leader election namespace, you'd still need a matching namespace on the tenant cluster since OPERATOR_NAMESPACE is hardcoded as the target namespace for all operand resource creation. How are you working around that?

@ratschance
Copy link
Copy Markdown
Author

@rajathagasthya For that, we have it set as cw-nvidia-gpu-operator and it happily creates all the resources in that namespace in the tenant's cluster as we desire. That namespace is where we would like the lease to go as well; however right now it's forcing us to create a second tenant-<...> namespace in the tenant's cluster so that it can write the lease

@rajathagasthya
Copy link
Copy Markdown
Contributor

@ratschance Got it. This change makes sense to me, we'll review this. In the meantime, could you rebase? Thanks!

@rajathagasthya rajathagasthya added enhancement Improvements to existing features, performance, or usability (not bug fixes or new features). and removed lifecycle/frozen labels Apr 2, 2026
@rajathagasthya rajathagasthya self-assigned this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvements to existing features, performance, or usability (not bug fixes or new features).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants