Skip to content

Conversation

@olamilekan000
Copy link
Contributor

Summary

Change adds custom kcp operator metrics to Prometheus
Screenshot 2025-10-24 at 06 48 57

What Type of PR Is This?

Related Issue(s)

Fixes 25

Release Notes


Added custom kcp-operator metrics to prometheus

@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the DCO. labels Oct 24, 2025
@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xrstf for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 24, 2025
@olamilekan000 olamilekan000 requested review from embik and mjudeikis and removed request for mjudeikis October 24, 2025 05:49
@olamilekan000 olamilekan000 force-pushed the add-custom-kcp-operator-metrics branch from 62caaa4 to a282340 Compare October 24, 2025 05:54
Copy link
Contributor

@mjudeikis mjudeikis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@embik @xrstf your take here would be appreciated


func (mc *MetricsCollector) updateRootShardCounts(ctx context.Context) {
var rootShards operatorv1alpha1.RootShardList
if err := mc.client.List(ctx, &rootShards); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use cache (all below too) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not get the actual metric values if they're cached, and besides, I think the Controller-runtime already implements caching by default

var (
RootShardCount = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "kcp_operator_rootshard_count",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is just (I think this is open to discussion @embik )

kcp_operator_object_count{type=rootshard}

Status of these rootshard, shardl frontproxy cache server should be same ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would save us a bit of cardinality:
currently:
4 metrics * 2 dimension = 8
suggested:
1 metrcic * 3 dimensions = 3.

Kubeconfig and certificates would stay separate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am entirely unconvinced that a _count metric is even useful. Users could simple count() other metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this can be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have other metrics with one instance per object, then I'd say yes.


ConditionStatus = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "kcp_operator_condition_status",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this metric is about. Need better docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shows the status of objects installed.
For example, you can have something like this.

kcp_operator_condition_status{condition_type="Available",namespace="default",resource_name="frontproxy-sample",resource_type="frontproxy"} 1
kcp_operator_condition_status{condition_type="Available",namespace="default",resource_name="shard-sample",resource_type="rootshard"} 0
kcp_operator_condition_status{condition_type="RootShard",namespace="default",resource_name="frontproxy-sample",resource_type="frontproxy"} 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldbe fine with one metric for the conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldbe fine with one metric for the conditions.

This is not too clear. Does it mean you're okay with the current implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

@olamilekan000 olamilekan000 force-pushed the add-custom-kcp-operator-metrics branch 6 times, most recently from 9886ccb to 8c170fc Compare October 28, 2025 21:55
Signed-off-by: olalekan odukoya <[email protected]>
@olamilekan000 olamilekan000 force-pushed the add-custom-kcp-operator-metrics branch from 8c170fc to d5ce156 Compare October 29, 2025 09:37
@olamilekan000
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: provide metrics

4 participants