Skip to content

allow for custom cluster domain for agent to api connection #736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

mikn
Copy link

@mikn mikn commented Mar 10, 2025

Issue number: #735

Description of changes:
Introduces a KUBERNETES_SERVICE_CLUSTER_DOMAIN env var that allows you to set the service cluster domain for the cluster which it uses to talk to the brupop apiserver.

Testing done:
None yet.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@mikn
Copy link
Author

mikn commented Mar 12, 2025

We tested this patch in our production and it did solve our problems. We did not fork the helm chart but instead we usually render them and edit them by hand.

If this is something you'd be willing to accept upstream, I can add it to the helm chart as well.

@yeazelm
Copy link
Contributor

yeazelm commented Mar 16, 2025

Thanks for the PR @mikn! Supporting this setting is something we are open to. I think adding in a bit more testing and changes to the helm chart would get us to a good place for merging this change. Can you confirm with some logs/output that a basic cluster works with the default and then provide an example of the custom cluster domain working? You can redact the custom name from the logs, I'd just like to document that we knew the change works for the PR.

@yeazelm
Copy link
Contributor

yeazelm commented Apr 28, 2025

Hey @mikn, I just wanted to check back in on if you got any testing done for this that you can share?

@mikn
Copy link
Author

mikn commented Apr 30, 2025

Hey @mikn, I just wanted to check back in on if you got any testing done for this that you can share?

Hi @yeazelm ! Sorry no - we have been extremely resource constrained for a while - but we have a longer weekend this weekend (and have finally caught up on work) so I should have some time to catch up to the outstanding issues wrt Bottlerocket.

@mikn
Copy link
Author

mikn commented May 11, 2025

Here are the proofs:
Without the env variable set:

> kubectl logs -n bottlerocket-system brupop-agent-vh287
  2025-05-11T16:49:07.134490Z  INFO apiserver::client::webclient: Created K8s API Server client using service port, service_port: 443
    at apiserver/src/client/webclient.rs:79

  2025-05-11T16:49:07.134510Z  INFO apiserver::client::webclient: Using cluster domain suffix, cluster_domain_suffix: svc.cluster.local
    at apiserver/src/client/webclient.rs:84

  2025-05-11T16:49:07.833086Z  INFO agent::agentclient: Brs has been created., brs_name: "ct-78560717.local.molnett.host"
    at agent/src/agentclient.rs:248
    in agent::agentclient::create_metadata_shadow
    in agent::agentclient::create_shadow_if_not_exist
    in agent::agentclient::run

  2025-05-11T16:49:08.233133Z  INFO agent::agentclient: Checking for Bottlerocket updates.
    at agent/src/agentclient.rs:213
    in agent::agentclient::shadow_status_with_refreshed_system_matadata with shadow_error_info: ShadowErrorInfo { crash_count: 0, state_transition_failure_timestamp: None }
    in agent::agentclient::initialize_metadata_shadow
    in agent::agentclient::initialize_shadow_if_not_initialized
    in agent::agentclient::run

  2025-05-11T16:49:19.344135Z  INFO agent::agentclient: Brs status has been updated., brs_name: "ct-78560717.local.molnett.host", brs_status: BottlerocketShadowStatus { current_version: "1.2.1", target_version: "1.2.1", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }
    at agent/src/agentclient.rs:267
    in agent::agentclient::update_metadata_shadow
    in agent::agentclient::initialize_metadata_shadow
    in agent::agentclient::initialize_shadow_if_not_initialized
    in agent::agentclient::run

With custom domain:

> bazel run //platform/cells/local:kubectl -- logs -n bottlerocket-system brupop-agent-ptnnn
INFO: Analyzed target //platform/cells/local:kubectl (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //platform/cells/local:kubectl up-to-date:
  .bazel/bin/platform/cells/local/kubectl.sh
INFO: Elapsed time: 0.074s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: .bazel/bin/platform/cells/local/kubectl.sh logs -n bottlerocket-system brupop-agent-ptnnn
  2025-05-11T16:53:37.760895Z  INFO apiserver::client::webclient: Created K8s API Server client using service port, service_port: 443
    at apiserver/src/client/webclient.rs:79

  2025-05-11T16:53:37.760917Z  INFO apiserver::client::webclient: Using cluster domain suffix, cluster_domain_suffix: svc.local.molnett.test
    at apiserver/src/client/webclient.rs:84

  2025-05-11T16:53:38.559117Z  INFO agent::agentclient: Checking for Bottlerocket updates.
    at agent/src/agentclient.rs:213
    in agent::agentclient::shadow_status_with_refreshed_system_matadata with shadow_error_info: ShadowErrorInfo { crash_count: 0, state_transition_failure_timestamp: None }
    in agent::agentclient::update_status_in_shadow with bottlerocket_shadow: BottlerocketShadow { ... }
    in agent::agentclient::run

I removed the object dump of the BottlerocketShadow object for brevity and altered the custom domain for secrecy. The reason they are different outputs from the two is mostly because the BRS and the cluster I produced the custom proof on has existed for a while now.

@mikn
Copy link
Author

mikn commented May 28, 2025

Hi again @yeazelm think it is possible to move this forward? 🙏

@yeazelm
Copy link
Contributor

yeazelm commented May 28, 2025

Looks good to me! Thanks for adding in the testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants