storeliveness: persist latency metric

When a store has disk stall or slowness, right before sending a new heartbeat, store liveness can hang due to the disk issues. This is by design: we don't want a store with an unhealthy disk to heartbeat. The implications of this are a hung store liveness goroutine (in the `SupportManager`) delays in sending and receiving new messages. This is one of the most common causes for lease churn with leader leases.

We should add a latency metric to measure the time spent in `writeProto` (potentially broken down by caller):

https://github.com/cockroachdb/cockroach/blob/c83c57d354741ac36740894f7387c014eb6c09fe/pkg/kv/kvserver/storeliveness/persist.go#L91-L95


Jira issue: CRDB-55415

	func writeProto(
	ctx context.Context, rw storage.ReadWriter, key roachpb.Key, msg protoutil.Message,
	) error {
	return storage.MVCCPutProto(ctx, rw, key, hlc.Timestamp{}, msg, storage.MVCCWriteOptions{})
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

storeliveness: persist latency metric #155381

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

storeliveness: persist latency metric #155381

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions