feat(monitoring): expose share rejection metrics on Prometheus surface by gimballock · Pull Request #475 · stratum-mining/sv2-apps

gimballock · 2026-05-03T00:43:56Z

Summary

The JSON API already reports per-channel shares_rejected (HashMap<String, u32> keyed by error_code) and shares_submitted, but none of this data reaches the Prometheus /metrics endpoint. Operators using Prometheus for alerting and dashboards have no way to track share rejection rates or reasons over time.

Motivation

Time-series alerting — rejection rate (rejected / submitted) is the most direct signal that something is wrong (miner misconfiguration, difficulty drift, network latency). Without Prometheus exposure, rate-based alerting is not possible.
Reason breakdown — the existing HashMap<String, u32> already distinguishes reasons (stale, duplicate-share, etc.). Surfacing this per-reason enables operators to distinguish transient stale-share spikes (normal after a new block) from sustained protocol errors.
API vs Prometheus gap — the JSON API serves this data as a point-in-time snapshot. Trend analysis, rate() queries, and recording rules require Prometheus exposure.

Current State

Server-side Prometheus (what exists):

sv2_server_shares_accepted_total{channel_id, user_identity}
No shares_submitted or shares_rejected gauges

Server-side JSON API (what exists but is not in Prometheus):

shares_submitted: u32
shares_rejected: HashMap<String, u32>

Client-side:

sv2_client_shares_accepted_total exists in Prometheus
No rejection data in Prometheus or in the client monitoring types — though upstream stratum-core client ShareAccounting already tracks rejected_shares: u32

Changes

Server metrics

sv2_server_shares_submitted_total{channel_id, user_identity} — gauge from shares_submitted, enables rejection-rate denominator
sv2_server_shares_rejected_total{channel_id, user_identity, error_code} — gauge from iterating shares_rejected map entries

Client metrics

sv2_client_shares_rejected_total{client_id, channel_id, user_identity} — gauge from upstream ShareAccounting::get_rejected_shares() (scalar u32, no per-reason breakdown available at this layer)
Added shares_rejected: u32 field to ExtendedChannelInfo / StandardChannelInfo in client monitoring types

Stale label cleanup

Tracks (channel_id, user_identity, error_code) triples in PreviousLabelSets and removes stale combinations on refresh, same pattern used for existing share/hashrate labels

Cardinality

The error_code label is nominally unbounded (the SV2 spec allows arbitrary strings), but in practice the server ShareValidationError enum defines ~9 well-known variants. Upstream stratum-mining/stratum#2142 is working toward typed error_code constants which will further bound this.

Client-side limitation

The server-side ShareAccounting in stratum-core does not currently track rejected shares (stratum-mining/stratum#2119). Client monitoring shares_rejected defaults to 0 until that is addressed upstream. The Prometheus gauge and API field are in place for when the data becomes available.

Example PromQL

# Rejection rate
rate(sv2_server_shares_rejected_total[5m]) / rate(sv2_server_shares_submitted_total[5m])

# Breakdown by reason
sum by (error_code) (rate(sv2_server_shares_rejected_total[5m]))

# Stale share spike
rate(sv2_server_shares_rejected_total{error_code="stale"}[1m])

Testing

All existing tests pass (80 in stratum-apps, 6 in pool, miner-apps all green). Existing test helpers updated for new fields.

sv2-apps#450 — added shares_rejected to server monitoring types (merged)
sv2-apps#468 — fixed shares_rejected API compatibility (merged)
channels_sv2::server::share_accounting::ShareAccounting needs refinement stratum#2119 — server-side ShareAccounting refinement (open)
define types for commonly-used error_code strings stratum#2142 — typed error_code constants (open)
need to switch ShareAccounting work representation from f64 to u64 on channels_sv2 stratum#2092 — f64 → u64 work representation (open)

…hotCache::refresh (stratum-mining#337) Move all Prometheus gauge updates (set + stale-label removal) out of the /metrics HTTP handler and into SnapshotCache::refresh(), which runs as a periodic background task. This eliminates the GaugeVec reset gap where label series momentarily disappeared on every scrape. Changes: - SnapshotCache now owns PrometheusMetrics and PreviousLabelSets - refresh() updates snapshot data AND Prometheus gauges atomically - /metrics handler reduced to: set uptime gauge, gather, encode - ServerState simplified (no more PreviousLabelSets or Mutex) - Tests updated to wire metrics through cache via with_metrics() - Integration tests: replace fixed-sleep assertions with poll_until_metric_gte (100ms poll, 5s deadline) for CI resilience - Clone impl preserves previous_labels for correct stale-label detection - debug-level tracing on stale label removal errors - debug_assert on with_metrics double-attachment Closes stratum-mining#337

Add Prometheus gauges for share submission and rejection data that was previously only available through the JSON monitoring API: Server metrics: - sv2_server_shares_submitted_total{channel_id, user_identity} - sv2_server_shares_rejected_total{channel_id, user_identity, error_code} Client metrics: - sv2_client_shares_rejected_total{client_id, channel_id, user_identity} These enable time-series alerting on rejection rates and per-reason breakdown (stale, duplicate-share, etc.) via rate() queries and recording rules. Implementation: - Register new GaugeVecs in PrometheusMetrics - Populate from existing shares_rejected HashMap (server) and rejected_shares u32 (client) in SnapshotCache::update_metrics - Track server rejection label triples in PreviousLabelSets for stale series cleanup - Add shares_rejected field to client ExtendedChannelInfo and StandardChannelInfo (defaults to 0 until stratum#2119 adds rejection tracking to server-side ShareAccounting)

Eric Price added 2 commits May 2, 2026 20:54

gimballock force-pushed the feat/share-rejection-prometheus-metrics branch from d847f42 to aa30316 Compare May 3, 2026 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(monitoring): expose share rejection metrics on Prometheus surface#475

feat(monitoring): expose share rejection metrics on Prometheus surface#475
gimballock wants to merge 2 commits intostratum-mining:mainfrom
fossatmara:feat/share-rejection-prometheus-metrics

gimballock commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gimballock commented May 3, 2026

Summary

Motivation

Current State

Changes

Server metrics

Client metrics

Stale label cleanup

Cardinality

Client-side limitation

Example PromQL

Testing

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant