Skip to content

Conversation

AndreasHolt
Copy link

What changed?

  • introduce store.ShardMetrics (smoothed load + timestamps) and persist it under store/<namespace>/shards/<shardID>/metrics. SmoothedLoad will keep an EWMA of shard load, LastUpdateTime will be used for dynamically updating the alpha value used in EWMA, and LastMoveTime will be used to support cooldown logic to limits shard churn
  • extend the etcd store so AssignShard/AssignShards write metrics alongside ownership, refresh LastMoveTime when reusing existing metrics, and apply per-shard metric updates after the main transaction to stay within etcd’s 128ops trnsaction limit
  • extend GetState to read the new metric keys and expose them in NamespaceState, allowing the leader to use it for future rebalancing decisiosn

Why?
reported_shards is keyed by executor. That works for reporting the latest heartbeat, but it breaks down the moment a shard moves. Then the new owner can’t see the old owner’s smoothed load or timestamps, and the leader has to collect executor-specific parts just to reason about shard state. By giving each shard its own metrics key:

  • the data survives ownership changes. New executors and the leader can pick up where the prev owner left off
  • the leader can read NamespaceState and compute balancing or throtling decisions without looking for per-exec heartbeats
  • we can store both an EWMA (so short spikes hopefully won’t cause thrashing) and timestamps: last_update_time is used for the decay value
    (alpha) when applying the next sample, and last_move_time is what we’ll use for cooldowns before moving a shard again.

A follow-up pull request will wire heartbeats to update the metrics each time.

How did you test it?
Integration tests w/ etcd (added new test cases to ./service/sharddistributor/store/etcd/etcdstore_test.go)
go test ./service/sharddistributor/store/etcd/executorstore
Also tested it by logging values while running the ephemeral service (which simulates executors and shards)

Potential risks
Added pressure to etcd and extra read operations when preparing metric updates

Release notes
Shard distributor now persists shard metrics in etcd (smoothed load and timestamps) for future load balancing logic.

Documentation Changes

@AndreasHolt AndreasHolt force-pushed the lb-shard-metrics-etcd branch from d393051 to 6360f8a Compare October 20, 2025 12:05
}

func BuildShardKey(prefix string, namespace, shardID, keyType string) (string, error) {
if keyType != ShardAssignedKey && keyType != ShardMetricsKey {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where/when is this used?

return parts[0], parts[1], nil
}

func BuildShardPrefix(prefix string, namespace string) string {
Copy link
Contributor

@eleonoradgr eleonoradgr Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have tests for BuildShardPrefix, BuildShardKey and ParseShardKey :)

shardID, shardKeyType, err := etcdkeys.ParseShardKey(s.prefix, namespace, string(kv.Key))
if err != nil {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't want to abort metric emission , we should still log an error such that we have evidence that something is not working as expected.

// Compute shard moves to update last_move_time metrics when ownership changes.
// Read current assignments for the namespace and compare with the new state.
// Concurrent changes will be caught by the revision comparisons later.
currentAssignments := make(map[string]string) // shardID -> executorID
Copy link
Contributor

@eleonoradgr eleonoradgr Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of building all the execurorsTOShard mapping can we rely on shardCache *shardcache.ShardToExecutorCache cache?

}
}
now := time.Now().Unix()
// Collect metric updates now so we can apply them after committing the main transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move all the code for metric generation to a separate function? it will make the overall code more readable

metrics store.ShardMetrics
modRevision int64
desiredLastMove int64 // intended LastMoveTime for this update
defaultLastUpdate int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the defaultLastUpdate? can we just call this LastUpdate?

} else {
update.metrics = store.ShardMetrics{
SmoothedLoad: 0,
LastUpdateTime: update.defaultLastUpdate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the way defaultLastUpdate is used, i think we can simplify the code and just remove it, wouln't be equal to use desired last move here?

for i := range updates {
update := &updates[i]

for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very difficult to read, I am not sure we will understand what it does in few weeks, can we remove this? :)

newAssignments[shardID] = executorID
}
}
now := time.Now().Unix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general we use clock.TimeSource to handle time, it makes testing easier, I would suggest to extend this using the same approach, you can check out in executorImpl for example

// shardMetricsUpdate tracks the etcd key, revision, and metrics used to update a shard
// after the main transaction in AssignShards for exec state.
// Retains metrics to safely merge concurrent updates before retrying.
type shardMetricsUpdate struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this statistics, metrics have a pretty standard meaning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants