Fix OpenTelemetry gauge values for server/service up metrics#4787
Fix OpenTelemetry gauge values for server/service up metrics#4787SasinduDilshara wants to merge 1 commit intowso2:masterfrom
Conversation
| } | ||
|
|
||
| @Override | ||
| public void serviceUp(String serviceName, String serviceType) { |
There was a problem hiding this comment.
Log Improvement Suggestion No: 2
| public void serviceUp(String serviceName, String serviceType) { | |
| @Override | |
| public void serviceUp(String serviceName, String serviceType) { | |
| log.info("Service up: " + serviceName + " of type: " + serviceType); |
There was a problem hiding this comment.
AI Agent Log Improvement Checklist
- The log-related comments and suggestions in this review were generated by an AI tool to assist with identifying potential improvements. Purpose of reviewing the code for log improvements is to improve the troubleshooting capabilities of our products.
- Please make sure to manually review and validate all suggestions before applying any changes. Not every code suggestion would make sense or add value to our purpose. Therefore, you have the freedom to decide which of the suggestions are helpful.
✅ Before merging this pull request:
- Review all AI-generated comments for accuracy and relevance.
- Complete and verify the table below. We need your feedback to measure the accuracy of these suggestions and the value they add. If you are rejecting a certain code suggestion, please mention the reason briefly in the suggestion for us to capture it.
| Comment | Accepted (Y/N) | Reason |
|---|---|---|
| #### Log Improvement Suggestion No: 2 |
WalkthroughThe changes update OpenTelemetry gauge metric values from timestamp-based to constant numeric values (1 for "up" states, 0 for "down"), add TestNG and Mockito test dependencies, and introduce a comprehensive unit test class verifying gauge value assignments and initialization behavior. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
components/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporter.java (1)
65-71: Consider atomicity of value+attributes pair updates.The
serverUpValueandserverUpAttrsare updated in separateset()calls (lines 278-279), which creates a small window where the observable callback could read a staleattrswith a newvalue(or vice versa). The same applies toserverVersionValue/serverVersionAttrs.In practice, this is unlikely to cause issues since:
- Server startup/shutdown happens infrequently
- The worst case is a single export cycle with mismatched data
However, for correctness, consider combining value and attributes into a single holder class stored in one
AtomicReference:💡 Optional: Atomic pair holder
private static class GaugeState { final Double value; final Attributes attrs; GaugeState(Double value, Attributes attrs) { this.value = value; this.attrs = attrs; } } private final AtomicReference<GaugeState> serverUpState = new AtomicReference<>(null);Then update atomically:
serverUpState.set(new GaugeState(epochSeconds, attributes));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporter.java` around lines 65 - 71, The serverUpValue/serverUpAttrs and serverVersionValue/serverVersionAttrs pairs must be updated atomically to avoid transient mismatches; introduce a small immutable holder class (e.g., GaugeState with final Double value and final Attributes attrs) and replace each pair with a single AtomicReference<GaugeState> (e.g., serverUpState and serverVersionState) and update them with a single set(new GaugeState(value, attrs)); update the observable callback code to read the pair via one get() call on the AtomicReference and remove the old separate AtomicReference fields to ensure atomic updates for both serverUp and serverVersion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@components/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporter.java`:
- Around line 65-71: The serverUpValue/serverUpAttrs and
serverVersionValue/serverVersionAttrs pairs must be updated atomically to avoid
transient mismatches; introduce a small immutable holder class (e.g., GaugeState
with final Double value and final Attributes attrs) and replace each pair with a
single AtomicReference<GaugeState> (e.g., serverUpState and serverVersionState)
and update them with a single set(new GaugeState(value, attrs)); update the
observable callback code to read the pair via one get() call on the
AtomicReference and remove the old separate AtomicReference fields to ensure
atomic updates for both serverUp and serverVersion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d80f9897-ddbd-48c6-a877-273e800a23f6
📒 Files selected for processing (3)
components/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/org/wso2/micro/integrator/observability/metric/handler/MetricReporter.javacomponents/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporter.javacomponents/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/org/wso2/micro/integrator/observability/metric/handler/prometheus/reporter/PrometheusReporterV1.java
Previously, serverUp(), serverVersion(), and serviceUp() set gauge values to the Unix epoch timestamp (milliseconds/1000) instead of a binary up/down indicator. This caused Grafana dashboards to show incorrect uptime values (a large epoch number instead of 1) and prevented proper visualization of service availability. Fixed by setting gauge value to 1 (indicating up state) for all three metrics, consistent with the Prometheus convention for up/down gauges. Also adds unit tests for OpenTelemetryReporter to verify gauge and counter initialization, and adds testng/mockito test dependencies. Fixes wso2#4710 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
18d94f8 to
2c79c4d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@components/mediation/data-publishers/org.wso2.micro.integrator.observability/src/test/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporterTest.java`:
- Around line 159-166: The test testServerDownSetsGaugeToZero calls
reporter.serverDown with javaVersion and javaHome swapped; update the argument
order in reporter.serverDown(...) within testServerDownSetsGaugeToZero so it
passes javaHome then javaVersion (e.g., "/usr/lib/jvm/java-21" before "21") to
match the corrected serverDown contract, then run the assertion that
mockServerUpGauge.set(...) captures 0.0 as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b70738bf-bf17-43ae-b0f3-8d1270bb982c
📒 Files selected for processing (3)
components/mediation/data-publishers/org.wso2.micro.integrator.observability/pom.xmlcomponents/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporter.javacomponents/mediation/data-publishers/org.wso2.micro.integrator.observability/src/test/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporterTest.java
✅ Files skipped from review due to trivial changes (1)
- components/mediation/data-publishers/org.wso2.micro.integrator.observability/pom.xml
🚧 Files skipped from review as they are similar to previous changes (1)
- components/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporter.java
| public void testServerDownSetsGaugeToZero() { | ||
| reporter.serverDown("localhost", "8290", "21", "/usr/lib/jvm/java-21"); | ||
|
|
||
| ArgumentCaptor<Double> valueCaptor = ArgumentCaptor.forClass(Double.class); | ||
| verify(mockServerUpGauge).set(valueCaptor.capture(), any(Attributes.class)); | ||
|
|
||
| assertEquals(valueCaptor.getValue(), 0.0, "serverDown() must set SERVER_UP gauge to 0.0"); | ||
| } |
There was a problem hiding this comment.
Fix serverDown argument order in the test to match the corrected contract.
Line 160 currently passes javaVersion before javaHome ("21", "/usr/lib/jvm/java-21"). That reintroduces the swapped-order scenario and can miss the exact regression this PR fixes.
Suggested patch
- reporter.serverDown("localhost", "8290", "21", "/usr/lib/jvm/java-21");
+ reporter.serverDown("localhost", "8290", "/usr/lib/jvm/java-21", "21");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public void testServerDownSetsGaugeToZero() { | |
| reporter.serverDown("localhost", "8290", "21", "/usr/lib/jvm/java-21"); | |
| ArgumentCaptor<Double> valueCaptor = ArgumentCaptor.forClass(Double.class); | |
| verify(mockServerUpGauge).set(valueCaptor.capture(), any(Attributes.class)); | |
| assertEquals(valueCaptor.getValue(), 0.0, "serverDown() must set SERVER_UP gauge to 0.0"); | |
| } | |
| public void testServerDownSetsGaugeToZero() { | |
| reporter.serverDown("localhost", "8290", "/usr/lib/jvm/java-21", "21"); | |
| ArgumentCaptor<Double> valueCaptor = ArgumentCaptor.forClass(Double.class); | |
| verify(mockServerUpGauge).set(valueCaptor.capture(), any(Attributes.class)); | |
| assertEquals(valueCaptor.getValue(), 0.0, "serverDown() must set SERVER_UP gauge to 0.0"); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@components/mediation/data-publishers/org.wso2.micro.integrator.observability/src/test/java/org/wso2/micro/integrator/observability/metric/handler/opentelemetry/reporter/OpenTelemetryReporterTest.java`
around lines 159 - 166, The test testServerDownSetsGaugeToZero calls
reporter.serverDown with javaVersion and javaHome swapped; update the argument
order in reporter.serverDown(...) within testServerDownSetsGaugeToZero so it
passes javaHome then javaVersion (e.g., "/usr/lib/jvm/java-21" before "21") to
match the corrected serverDown contract, then run the assertion that
mockServerUpGauge.set(...) captures 0.0 as before.
Claude Issue Analysis — [Issue #4710]: Need to update the grafana dashboards to be compatible with data published with open telemetryClassification
Reproducibility
Root Cause AnalysisThe issue encompasses multiple interconnected sub-problems in the OTel metrics pipeline: Sub-issue 1: Request Counts Delayed (~10 min for small counts)Root cause: The OTel SDK uses a
For sparse traffic (few requests in a 60s window), Grafana may require multiple export cycles before enough data points accumulate to display. Large request counts generate enough data per window to appear immediately. Sub-issue 2: Incorrect Values After HotdeploymentRoot cause: When services are hot-deployed, OTel counters maintain their cumulative values (OTel uses CUMULATIVE temporality by default). However, when the time period in Grafana is increased, the dashboard query using The
Sub-issue 3: Request Rates and Error Rates Not DisplayedRoot cause: The Grafana dashboards were originally designed for the Prometheus direct pull model (PrometheusReporter). With OTel OTLP push to collector then Prometheus scrape, metric names may be transformed by the OTel-Prometheus bridge (e.g., counter names ending in Sub-issue 4: Integration Node Metrics Dashboard Mostly EmptyRoot cause: The OTel SDK exports only the metrics registered in Sub-issue 5: Uptime Shows Incorrect ValuesRoot cause: The In double epochSeconds = System.currentTimeMillis() / 1000.0;
((DoubleGauge) gauge).set(epochSeconds, attributes); // Sets ~1.743e9, not 1
This matches what Sub-issue 6: Logs Not Displayed in Grafana DashboardsRoot cause: OTel logs require a separate Loki/log aggregation pipeline. The current MI OTel setup only configures Test Coverage Assessment
|
Claude Fix Verification ReportIssue: #4710 Reproduction Steps Executed
ResultThe bug (Sub-issue 5: uptime/service-status gauges publishing Unix epoch timestamp ~1.743e9 instead of The patched
All 12 unit tests pass, including direct assertions that:
EvidenceServer startup with patch appliedHTTP responses (10 requests to test API)OTel metric export attempt (collector not running, expected)(This confirms the OTel SDK is active and attempting exports — connection failure is expected with no OTel collector running.) Unit test results (12/12 pass)Key tests:
|
Summary
serverUp(),serverVersion(), andserviceUp()methods inOpenTelemetryReporterto set gauge value to1(up) instead of the Unix epoch timestampOpenTelemetryReportercovering metric initialization and gauge/counter valuestestngandmockito-coretest-scoped dependencies to the observability module pomFixes #4710
Issue Analysis
Issue #4710 Analysis
Classification
org.wso2.micro.integrator.observability—OpenTelemetryReporterRoot Cause
Sub-issue 5 (Uptime Shows Incorrect Values):
serverUp(),serverVersion(), andserviceUp()were setting gauge values toSystem.currentTimeMillis() / 1000.0(Unix epoch seconds, ~1.743×10⁹) instead of1(binary up indicator). This caused Grafana dashboards to display a large timestamp number rather than the expected "1 = up" state.Files changed:
components/mediation/data-publishers/org.wso2.micro.integrator.observability/src/main/java/.../OpenTelemetryReporter.javacomponents/mediation/data-publishers/org.wso2.micro.integrator.observability/pom.xmlcomponents/mediation/data-publishers/org.wso2.micro.integrator.observability/src/test/.../OpenTelemetryReporterTest.java(new)Other Sub-Issues Noted (Not in scope of this PR)
increase()/rate()queriesTest Plan
OpenTelemetryReporterverifying gauge and counter initialization1forserverUp,serverVersion, andserviceUp