server: improve clock offset monitoring

The `Clock Offset` graph in DB console displays a [mean](https://github.com/cockroachdb/cockroach/blob/2ffa52615afb128697175762761800f8974c9eda/pkg/rpc/clock_offset.go#L342-L351) offset from one node to other nodes. The offsets are signed, so it's possible to distinguish a node's clock that is mostly behind or mostly in front of other nodes. Example:

<img width="1066" alt="Screenshot 2023-10-02 at 17 05 38" src="https://github.com/cockroachlabs/support/assets/3757441/17ee9fb0-280b-4be2-bc45-6efdadd55d36">

The **mean** offset is not necessarily the best metric for analysis, for reasons:
- positive and negative offsets cancel each other out
- one skewed node messes up all nodes' offset graphs, which makes it harder to identify the outlier

We should have more comprehensive metrics.

1. For example, in addition to the mean offset, we could report a histogram, or at least a set of: min offset, max offset, 50%.

2. Also, the number of nodes participating in the computation can [change](https://github.com/cockroachdb/cockroach/blob/2ffa52615afb128697175762761800f8974c9eda/pkg/rpc/clock_offset.go#L339) dynamically. We could plot this figure as well.

3. A node can [terminate](https://github.com/cockroachdb/cockroach/blob/82d57212ee1c4e256171545d7ae02bbb5117b8b9/pkg/rpc/peer.go#L424-L435) itself if its clock is a 50%+ outlier from other nodes. We should make metrics that are indicative of this event coming, so that alerting can notice this situation earlier than the node kills itself. That is why something like a 50%-ile offset graph is a better indicator. Another indicator could be: the number/percent of nodes whose offset is above the threshold.

Jira issue: CRDB-33459

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: improve clock offset monitoring #114321

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: improve clock offset monitoring #114321

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions