add OpenTelemetry metrics instrumentation#3110
add OpenTelemetry metrics instrumentation#3110PavelPashov wants to merge 49 commits intoredis:masterfrom
Conversation
a11117e to
5fb1791
Compare
1be4565 to
03db1a2
Compare
There was a problem hiding this comment.
❌ The following Jit checks failed to run:
- license-compliance-checker
- secret-detection-trufflehog
- software-component-analysis-js
- static-code-analysis-semgrep-pro
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
1106f8a to
242e98d
Compare
|
This is an exciting feature, I hope it makes it to main! |
❌ Security scan failedSecurity scan failed: Branch feat/add-opentelemetry-metrics does not exist in the remote repository 💡 Need to bypass this check? Comment |
364ff86 to
30b0e8c
Compare
❌ Security scan failedSecurity scan failed: Branch feat/add-opentelemetry-metrics does not exist in the remote repository 💡 Need to bypass this check? Comment |
1 similar comment
❌ Security scan failedSecurity scan failed: Branch feat/add-opentelemetry-metrics does not exist in the remote repository 💡 Need to bypass this check? Comment |
0e91b34 to
e2cc888
Compare
af9f0fb to
d0f42d3
Compare
packages/client/lib/client/cache.ts
Outdated
| ); | ||
| // Estimate bytes saved by avoiding network round-trip | ||
| // Note: JSON.stringify approximation; actual RESP wire size may differ (especially for Buffers) | ||
| const bytesEstimate = JSON.stringify(cacheEntry.value).length; |
There was a problem hiding this comment.
What do you think about wrapping it with Buffer.byteLength(JSON.stringify(cacheEntry.value), 'utf8');? If data to be stored isn't just ASCII it makes sense to measure UTF-8 bytes count
There was a problem hiding this comment.
That's a good catch!
| } from "./types"; | ||
| import { noopFunction } from "./utils"; | ||
|
|
||
| export class NoopCommandMetrics implements IOTelCommandMetrics { |
There was a problem hiding this comment.
Why do you need this object? Just to ensure that no commands will be actually send to a collector?
There was a problem hiding this comment.
Because we have multiple metric groups and don’t know what the user will opt into, I kept each group in its own class and defaulted them to noop. When the user initializes metrics, we swap in the real implementations based on config, so all checks happen once during OpenTelemetry.init(...). This keeps runtime code simple (no repeated conditional checks) and ensures disabled metrics have near-zero overhead
There was a problem hiding this comment.
So this is basically Null object pattern where in case if Otel is disabled empty methods are called?
| TRANSFORM_LEGACY_REPLY?: boolean; | ||
| transformReply: TransformReply | Record<RespVersions, TransformReply>; | ||
| unstableResp3?: boolean; | ||
| onSuccess?: (args: ReadonlyArray<RedisArgument>, reply: unknown, clientId: string) => void; |
There was a problem hiding this comment.
@nkaradzhov we might want to rename onSuccess to something more metrics/otel related
| if (command.onSuccess) { | ||
| command.onSuccess(parser.redisArgs, finalReply, this._self._clientId); |
There was a problem hiding this comment.
@nkaradzhov there might be a better way to record specific command related metrics that require the server response
| // Build the appropriate function based on options | ||
| if (options.hasIncludeCommands || options.hasExcludeCommands) { | ||
| // Version with filtering | ||
| this.createRecordOperationDuration = this.#createWithFiltering.bind(this); | ||
| } else { | ||
| this.createRecordOperationDuration = | ||
| this.#createWithoutFiltering.bind(this); | ||
| } | ||
| } |
There was a problem hiding this comment.
@nkaradzhov to make sure we don't do unnecessary checks after initiating the metrics singleton if the user hasn't provided any included or excluded commands
… in command and CSC metrics
| this.clear(false); | ||
| // Record invalidations as server-initiated evictions | ||
| if (oldSize > 0) { | ||
| OTelMetrics.instance.clientSideCacheMetrics.recordCacheEviction(CSC_EVICTION_REASON.INVALIDATION, oldSize); |
There was a problem hiding this comment.
Cache eviction metrics missing clientId for attribute resolution
Low Severity
All recordCacheEviction calls omit clientId, so resolveClientAttributes returns undefined and the resulting eviction metrics lack server.address, server.port, and db.client.connection.pool.name attributes. In contrast, recordCacheRequest and recordNetworkBytesSaved in the same class do pass client._clientId, making CSC eviction metrics inconsistent with other CSC metrics and impossible to correlate by server.
Additional Locations (2)
bf97457 to
adbe066
Compare
| // because CSC stores transformed replies and does not retain raw byte size. | ||
| // Implement this at a lower protocol/parsing layer where response bytes are known. | ||
| } | ||
| } |
There was a problem hiding this comment.
Network bytes saved metric is silently a no-op
Medium Severity
OTelClientSideCacheMetrics.recordNetworkBytesSaved has an empty body with a TODO comment, yet cache.ts calls it on every cache hit expecting the redis.client.csc.network_saved counter to be incremented. The metric instrument is registered and documented but will always report zero, silently providing incorrect observability data to users who enable client-side caching metrics.
Additional Locations (1)
adbe066 to
bc14732
Compare


Description
TODO
Checklist
npm testpass with this change (including linting)?Note
Medium Risk
Touches core client lifecycle (queue/socket construction, command execution paths, pool/cluster wiring) to emit metrics and maintain per-client identity/registry; while mostly no-op unless
OpenTelemetry.initis called, mistakes could still impact performance or shutdown/cleanup behavior.Overview
Adds first-class OpenTelemetry metrics support via a new
OpenTelemetry.init({ metrics: ... })entrypoint (exported from@redis/client) plus a singletonOTelMetricsimplementation andClientRegistryused by observable gauges.Instruments the client stack to emit metrics for command durations (including MULTI/PIPELINE), connection lifecycle/closure, pool wait time and pending requests, enterprise maintenance notifications/handoff, client-side cache hits/misses/evictions, pub/sub in+out message counts, and stream lag (via command
onSuccesshooks). Includes new client identity IDs (standalone/cluster/pool roles) with registration/unregistration on close/destroy/quit, plus updated docs (docs/otel-metrics.md), anotel-metrics.jsexample, and extensive new test coverage.Written by Cursor Bugbot for commit bc14732. This will update automatically on new commits. Configure here.