Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ WIP: Cluster provider and cluster-aware controllers #3019

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

embik
Copy link
Member

@embik embik commented Nov 22, 2024

This is a continuation of #2726 and #2207, a prototype for a multi-cluster support in controller-runtime. A design proposal is at #2746 which will need updating depending on the feedback for this PR. I've reorganized code changes into different commits than the previous attempts, but tried to keep authorship of changes faithful to the commits this work is based on.

The main change from the previous PR (#2726) is the update to typed generics support and adjustment to have a BYO request type. The implications of that is a) that you will need to bring your own typed EventHandler and b) that a wrapper function is needed to inject information (mostly the cluster name) when establishing a handler.

Apart from that, the core design of this is the cluster provider that can be plugged into a manager. This unlocks different ways to discover the Kubernetes "fleet" that is being reconciled.

@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 22, 2024
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 22, 2024
@embik embik force-pushed the embik-typed-cluster-support branch from 9584ab6 to 9b3f243 Compare November 22, 2024 12:44
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Nov 22, 2024
pkg/builder/controller.go Outdated Show resolved Hide resolved
Comment on lines +40 to +43
// Name returns the name of the cluster. It identifies the cluster in the
// manager if that is attached to a cluster provider. The value is usually
// empty for the default cluster of a manager.
Name() string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're internally working on a mechanism for dealing with multiple clusters, too. It's been helpful for the cluster identifier to be more flexible than a simple string. We've made it comparable.

Something like this:

type Cluster[ID comparable] interface {
  Identifier() ID
  ...
}

I understand this suggestion would cause a much larger change requiring type parameters in many places. Future work could be done to migrate.

Mostly, I wanted to raise this as something to consider and that calling the method Identifier would provide room to migrate in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely open to that, I think renaming it to Identifier() should be easy enough! Maybe some maintainers can leave a note if they consider this a future endeavour and this would be a worthwhile change to prepare that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were there before, had a logical cluster abstraction, and then went back to string as it just simplifies everything.

@sprsquish Am curious why a string as identifier and some lookup table on the cluster provider side does not work?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, adding Name() string to the interface has no impact on the consumers of the Cluster type. But adding a type parameter forces them to pass e.g. string or whatever everywhere including the manager. I don't think that's a good idea. IMO, we should not do it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sttts

Am curious why a string as identifier and some lookup table on the cluster provider side does not work?

Strings work fine.. We like to work with more structure in our identifiers. For instance, working with multiple clusters across cloud providers it's nice to have a structure like this:

type ClusterID struct {
  Provider string
  Region string
  Account string
  Name string
}

It can be marshaled into a string, of course, it's just nice not to have to.

I understand the need for a simpler approach and appreciate that this has been discussed already. I think there's a way to migrate over time that follows what's been happening with handlers by introducing TypedCluster[ID comparable] and an alias for the common use case type Cluster = TypedCluster[string] (Manager would need the same treatment).

The only change to the current proposal would be to use the more generic Identifier() rather than Name(). That leaves the door open for future discussions regarding generic identifiers without having to do all the work up front.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your use-case correctly, this pattern should help you elegantly: https://goplay.space/#wRWyxv-GUrj

Or in other words: the type only matters in code that is ClusterID aware. Turning the string back into the rich format is one type conversion away.

@embik embik force-pushed the embik-typed-cluster-support branch from 9b3f243 to f87db1f Compare December 4, 2024 10:58
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 7, 2024
@embik embik force-pushed the embik-typed-cluster-support branch from f87db1f to 95a2c06 Compare December 9, 2024 11:20
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2024
@embik embik force-pushed the embik-typed-cluster-support branch from 95a2c06 to 5c285f6 Compare December 9, 2024 15:56
type Provider interface {
// Get returns a cluster for the given identifying cluster name. Get
// returns an existing cluster if it has been created before.
Get(ctx context.Context, clusterName string) (Cluster, error)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the default cluster a thing here already? I.e. can I always pass empty string?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify what is returned for an unknown cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default cluster is not a thing here, that is in Manager.GetCluster. The cluster provider doesn't have a concept of a default cluster.

Should we specify what is returned for an unknown cluster?

Probably. I can add a sentence to clarify this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note on what should happen if the cluster is unknown.

Comment on lines 45 to 48
// Provider defines methods to retrieve clusters by name. The provider is
// responsible for discovering and managing the lifecycle of each cluster.
//
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"defines methods" is redundant for an interface.

Suggested change
// Provider defines methods to retrieve clusters by name. The provider is
// responsible for discovering and managing the lifecycle of each cluster.
//
// Provider allows to retrieve clusters by name. The provider is
// responsible for discovering and managing the lifecycle of each cluster.
//

It's kind of unclear what managing means in a getter interface.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe be concrete saying:

Suggested change
// Provider defines methods to retrieve clusters by name. The provider is
// responsible for discovering and managing the lifecycle of each cluster.
//
// Provider allows to retrieve clusters by name, e.g. to reconcilers.
// The provider is responsible for discovering and managing the lifecycle
// of each cluster, calling `Engage` and `Disengage` on the manager
// it is run against, whenever a new cluster is discovered or a cluster
// is unregistered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I've added that.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: embik
Once this PR has been reviewed and has the lgtm label, please assign joelanford for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2025
@embik embik force-pushed the embik-typed-cluster-support branch from 83360c4 to 1dcba5e Compare January 8, 2025 08:18
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2025
@embik embik force-pushed the embik-typed-cluster-support branch 2 times, most recently from b604946 to a362992 Compare January 8, 2025 09:39
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2025
embik and others added 6 commits January 20, 2025 08:54
On-behalf-of: SAP [email protected]
Co-authored-by: Vince Prignano <[email protected]>
Co-authored-by: Dr. Stefan Schimanski <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected]
Co-authored-by: Vince Prignano <[email protected]>
Co-authored-by: Dr. Stefan Schimanski <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected]
Co-authored-by: Vince Prignano <[email protected]>
Co-authored-by: Dr. Stefan Schimanski <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected]
Co-authored-by: Vince Prignano <[email protected]>
Co-authored-by: Dr. Stefan Schimanski <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected]
Co-authored-by: Vince Prignano <[email protected]>
Co-authored-by: Dr. Stefan Schimanski <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
On-behalf-of: SAP [email protected]
Co-authored-by: Vince Prignano <[email protected]>
Co-authored-by: Dr. Stefan Schimanski <[email protected]>
Co-authored-by: Iván Álvarez <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
embik and others added 2 commits January 20, 2025 08:54
On-behalf-of: SAP [email protected]
Co-authored-by: Vince Prignano <[email protected]>
Co-authored-by: Dr. Stefan Schimanski <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
@embik embik force-pushed the embik-typed-cluster-support branch from a362992 to e93979e Compare January 20, 2025 07:54
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 20, 2025
@k8s-ci-robot
Copy link
Contributor

@embik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-controller-runtime-apidiff e93979e link false /test pull-controller-runtime-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants