-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Enable health checks on all Kubernetes clusters by default #60169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description walks through what happens when creating a new cluster, but what about a cluster that is upgraded with an existing default health_check_config, will it be amended with kubernetes_labels
?
Default |
c6b8fdb
to
daa8aee
Compare
lib/auth/init.go
Outdated
func createPresetHealthCheckConfig(ctx context.Context, svc services.HealthCheckConfig) error { | ||
// To support developing health checks for multiple resources over time, | ||
// while avoiding migration of the backend database, | ||
// and enabling ease of health check adoption: | ||
// - Create a health check preset for each resource (db, kube, etc) | ||
|
||
page, _, err := svc.ListHealthCheckConfigs(ctx, 0, "") | ||
if err != nil { | ||
return trace.Wrap(err, "failed listing available health check configs") | ||
} | ||
if len(page) > 0 { | ||
if len(page) == 0 { | ||
// No health check configs exist. | ||
// Create all preset configs. | ||
presetDB := services.NewPresetHealthCheckConfigDB() | ||
_, err = svc.CreateHealthCheckConfig(ctx, presetDB) | ||
if err != nil && !trace.IsAlreadyExists(err) { | ||
return trace.Wrap(err, | ||
"failed creating preset health_check_config %s", | ||
presetDB.GetMetadata().GetName(), | ||
) | ||
} | ||
presetKube := services.NewPresetHealthCheckConfigKube() | ||
_, err = svc.CreateHealthCheckConfig(ctx, presetKube) | ||
if err != nil && !trace.IsAlreadyExists(err) { | ||
return trace.Wrap(err, | ||
"failed creating preset health_check_config %s", | ||
presetKube.GetMetadata().GetName(), | ||
) | ||
} | ||
return nil | ||
} else { | ||
// Health check configs exist. | ||
// Create per-resource presets. | ||
// Skip creating a DB preset; historically, it's the first, and already exists. | ||
|
||
// Look for an existing kube preset. | ||
for _, cfg := range page { | ||
if cfg.GetMetadata().GetName() == teleport.PresetDefaultHealthCheckConfigKubeName { | ||
return nil | ||
} | ||
} | ||
// Create a kube preset. | ||
presetKube := services.NewPresetHealthCheckConfigKube() | ||
_, err = svc.CreateHealthCheckConfig(ctx, presetKube) | ||
if err != nil && !trace.IsAlreadyExists(err) { | ||
return trace.Wrap(err, | ||
"failed creating preset health_check_config %s", | ||
presetKube.GetMetadata().GetName(), | ||
) | ||
} | ||
} | ||
preset := services.NewPresetHealthCheckConfig() | ||
_, err = svc.CreateHealthCheckConfig(ctx, preset) | ||
if err != nil && !trace.IsAlreadyExists(err) { | ||
return trace.Wrap(err, | ||
"failed creating preset health_check_config %s", | ||
preset.GetMetadata().GetName(), | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fspmarshall would this be a good use case for AtomicWrite that assert the existing key does not exist?
constants.go
Outdated
// PresetDefaultHealthCheckConfigDBName is the name of a preset | ||
// health_check_config that enables health checks for all | ||
// database resources. | ||
PresetDefaultHealthCheckConfigDBName = "default_db" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the implications of changing this name? If I already have a "default" health check config in my cluster and upgrade to a new version will I have a "default", "default_db" and "default_kube" health check? What if I've already modified the "default" config to meet my liking and now the "default_db" check adds all dbs again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity and overall fewer headaches it might be worthwhile to revert this and leave "deafult" as the default db config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a headache either way.
The idea is:
- Pre-
v18.4
default
enables health check on all databasesdefault_kube
enables health check on all Kubernetes clusters
v18.4
and laterdefault_db
enables health check on all databasesdefault_kube
enables health check on all Kubernetes clusters
Then update the docs about explaining default
and default_db
in different versions.
If I already have a "default" health check config in my cluster and upgrade to a new version will I have a "default", "default_db" and "default_kube" health check? What if I've already modified the "default" config to meet my liking and now the "default_db" check adds all dbs again?
Upgrading would yield default
and default_kube
only. Any changes to default
would remain as is.
New installations would yield default_db
and default_kube
.
Changing default
to default_db
is a headache. Also explaining default
is just for databases is a bit of a headache. From a customer point of view, default
may seem like it's the default for all resources. In the docs we would need to explain it's for databases, and default_kube
is for Kubernetes.
How ever you prefer. With that explanation, what are your thoughts on default
and default_db
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it will be less of a headache and less work to use default
and default_kube
. If it proves to be problematic and a point of contention in the future we can reconsider, but for now I would start simple and iterate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to default
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a third way. I'm not sure I really like it but it works surprisingly well for installers:
- There is no default resource in the backend.
- When people get
default
and it returnstrace.NotFoundErr
, the auth returns the default value, this value depends on the teleport version. - When people create the
default
installer, it lives in the backend and is returned by the auth, overriding the version's default.
This allows both users to say "I opt out of changes, here's my own config" by creating the resource, and if they don't care we can make things evolve without user interaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hugoShaka I like that approach, but it would in essence only work for new clusters right? Existing clusters would already have a default with only database labels and we're back at the same place - there is no way for existing clusters to automatically have Kubernetes health checks enabled when the upgrade from 18.N to 18.3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, by default only new clusters would get the new kube healthcheck on by default. I suppose that we could make a migration-like behaviour and delete the resource on auth startup if it is strictly equal to the preset we used to create (no modification were made).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be quite confusing if new auth and old auth instances coexisted and kept deleting and recreating the resource when restarted 😅.
In the hopes of reaching consensus and a path forward, the goal is to enable Kubernetes health checks by default, for new and existing clusters, without interfering with any human made changes to existing default
health checks, especially those which may already be managed by IAC.
- Adding a new
default_kube
config achieves that, at the expense of requiring a default per health checked resource (i.edefault_app
,default_deskop
, etc). - We could follow the preset role mechanism and stipulate that the default is owned by Teleport and it shouldn't be modified. This will potentially effect any existing clusters that have altered the
default
labels.
It seems like the only objection to default_kube
centers around creating multiple configs. However, it is also the only option presented so far that meets all of the above criteria. Unless there is another option that makes sense I'm inclined to say we should proceed with that route.
@r0mant since you have some context on health checks, and specifically the default health check, what are your thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having separate default_<protocol>
health check resources seems fine to me. It's a bit unfortunate that the database one is just called default
but IMO it's not the end of the world and we can live with this inconsistency or figure out a migration if we want.
I don't think we can really prevent users from modifying them, they need to be able to disable default health checks or tweak their parameters after all. So they're not even presets strictly speaking.
The only thing is we'd need to document that if you delete default_x
health check, it will be recreated by the cluster, so if you want to change their behavior, you should edit the resource, not delete it.
daa8aee
to
a1610eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few suggestions
lib/auth/init.go
Outdated
|
||
// createPresetHealthCheckConfig creates a default preset health check config | ||
// resource that enables health checks on all resources. | ||
func createPresetHealthCheckConfig(ctx context.Context, svc services.HealthCheckConfig) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: docs updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, comments are updated.
Lines 1500 to 1502 in f849c30
// createPresetHealthCheckConfigs creates a preset health check config | |
// for each resource using the healthcheck package. | |
func createPresetHealthCheckConfigs(ctx context.Context, svc services.HealthCheckConfig) error { |
lib/auth/init.go
Outdated
if len(page) == 0 { | ||
// No health check configs exist. | ||
// Create all preset configs. | ||
presetDB := services.NewPresetHealthCheckConfigDB() | ||
_, err = svc.CreateHealthCheckConfig(ctx, presetDB) | ||
if err != nil && !trace.IsAlreadyExists(err) { | ||
return trace.Wrap(err, | ||
"failed creating preset health_check_config %s", | ||
presetDB.GetMetadata().GetName(), | ||
) | ||
} | ||
presetKube := services.NewPresetHealthCheckConfigKube() | ||
_, err = svc.CreateHealthCheckConfig(ctx, presetKube) | ||
if err != nil && !trace.IsAlreadyExists(err) { | ||
return trace.Wrap(err, | ||
"failed creating preset health_check_config %s", | ||
presetKube.GetMetadata().GetName(), | ||
) | ||
} | ||
return nil | ||
} else { | ||
// Health check configs exist. | ||
// Create per-resource presets. | ||
// Skip creating a DB preset; historically, it's the first, and already exists. | ||
|
||
// Look for an existing kube preset. | ||
for _, cfg := range page { | ||
if cfg.GetMetadata().GetName() == teleport.PresetDefaultHealthCheckConfigKubeName { | ||
return nil | ||
} | ||
} | ||
// Create a kube preset. | ||
presetKube := services.NewPresetHealthCheckConfigKube() | ||
_, err = svc.CreateHealthCheckConfig(ctx, presetKube) | ||
if err != nil && !trace.IsAlreadyExists(err) { | ||
return trace.Wrap(err, | ||
"failed creating preset health_check_config %s", | ||
presetKube.GetMetadata().GetName(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to build per layer preset can we consider building it in generic way like:
var presetsCreators = []func() *healthcheckconfigv1.HealthCheckConfig{
services.NewPresetHealthCheckConfigDB,
services.NewPresetHealthCheckConfigKube,
// Easy for future extension
}
func createPresetHealthCheckConfig(ctx context.Context, svc services.HealthCheckConfig) error {
existing := make(map[string]bool)
allHealthCheckConfigs, err := stream.Collect(clienutils.Resouces(ctx, svc.ListHealthCheckConfigs))
if err != nil {
return trace.Wrap(err, "failed listing available health check configs")
}
for _, cfg := range allHealthCheckConfigs {
existing[cfg.GetMetadata().GetName()] = true
}
for _, createFn := range presetsCreators {
preset := createFn()
if !existing[preset.GetMetadata().GetName()] {
if err := createPresetIfNotExists(ctx, svc, preset); err != nil {
return err
}
}
}
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice generalized form for now and in the future.
Implemented with slight adjustments.
Lines 1502 to 1528 in f849c30
func createPresetHealthCheckConfigs(ctx context.Context, svc services.HealthCheckConfig) error { | |
// The choice to create a preset per-resource is motivated by: | |
// - Supporting existing Teleport clusters already using health checks with some resources | |
// - Avoiding migration of the backend database, which avoids downtime and headaches | |
// - Easing the adoption of health checks for new resources as they are developed over time | |
exists := make(map[string]bool) | |
cfgs, err := stream.Collect(clientutils.Resources(ctx, svc.ListHealthCheckConfigs)) | |
if err != nil { | |
return trace.Wrap(err, "unable to list health check configs") | |
} | |
for _, cfg := range cfgs { | |
exists[cfg.GetMetadata().GetName()] = true | |
} | |
var errs []error | |
for name, newPreset := range newHealthPresets { | |
if !exists[name] { | |
if _, err = svc.CreateHealthCheckConfig(ctx, newPreset()); err != nil && !trace.IsAlreadyExists(err) { | |
errs = append(errs, err) | |
} | |
} | |
} | |
if len(errs) > 0 { | |
return trace.NewAggregate(errs...) | |
} | |
return nil | |
} | |
a1610eb
to
f849c30
Compare
A per-resource health check config approach is implemented for enabling ease of adoption of new health checks, while avoiding migration of the backend database. Changes: - Added `default_kube` health check config which enables health checks on all Kubernetes clusters - Revised initialization and insert logic for health check configs Part of #58413
f849c30
to
a0c446f
Compare
Kubernetes health checks are enabled by default.
A per-resource health check config approach is implemented for enabling ease of adoption of new health checks, while avoiding migration of the backend database.
In this PR:
default_kube
health check config which enables health checks on all Kubernetes clustersManual testing:
Manual testing was done once with a new installation, and once with an upgrade path with the previous existing
default
health check config.Testing with a new installation
/var/lib/teleport/backend/sqlite.db
.teleport
with PR changesTeleport terminal output show Kubernetes health checks functioning.
kv
DB table shows a separate health check config for databases and Kubernetes.Testing with a previous installation
/var/lib/teleport/backend/sqlite.db
.teleport
without PR changesTeleport terminal output show Kubernetes health checks are disabled.
kv
DB table shows a single health check config for databases.teleport
with PR changesTeleport terminal output show Kubernetes health checks functioning.
kv
DB table shows a separate health check config for databases and Kubernetes clusters.Relates to: