feat(grafana): add deployment-annotations tool for change correlation (#2689)#2702
Conversation
Greptile code reviewThis repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md. Run a review — add a PR comment with: Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5. Optional: automate with the greploop skill. |
Greptile SummaryThis PR adds a
Confidence Score: 5/5Safe to merge; the new tool is read-only, reuses existing auth, and is well-tested. The change is additive and read-only — no mutations, no schema migrations, no new credentials. The implementation faithfully mirrors established Grafana tool patterns, the time-window and credential-forwarding logic is tested exhaustively, and all four backend implementors are kept in sync. The only notable gap is a minor response-shape inconsistency between the fixture and HTTP paths. app/tools/GrafanaAnnotationsTool/init.py — the backend and HTTP paths return slightly different response shapes; worth aligning before the response contract solidifies. Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent
participant Tool as query_grafana_annotations
participant Client as GrafanaClientBase
participant Grafana as Grafana /api/annotations
Agent->>Tool: call(tags, time_range_minutes, from, to)
alt grafana_backend injected (test/fixture path)
Tool->>Tool: grafana_backend.query_annotations(tags, limit)
Tool-->>Agent: "{available, annotations, total, raw}"
else HTTP client path (production)
Tool->>Tool: resolve time window (ISO to epoch ms)
Tool->>Client: query_annotations(from_ts, to_ts, tags, limit)
Client->>Grafana: "GET /api/annotations?from&to&type=annotation&tags"
Grafana-->>Client: JSON array of raw annotation objects
Client->>Client: _map_annotation() to ISO timestamps
Client-->>Tool: list[dict]
Tool-->>Agent: "{available, annotations, total, tags_filter, from, to}"
end
Reviews (2): Last reviewed commit: "feat(grafana): add deployment-annotation..." | Re-trigger Greptile |
…Tracer-Cloud#2689) Add a read-only GrafanaAnnotationsTool plus a query_annotations() method on GrafanaClientBase so the agent can answer "did a deploy/config change precede this alert?" for deploys from any source (ArgoCD/Flux, Helm, Terraform, manual), not just GitHub pushes. Complements GitDeployTimelineTool and reuses the existing Grafana auth. - service: query_annotations() mirrors query_alert_rules() (direct requests.get -> list[dict]); _map_annotation/_epoch_ms_to_iso map /api/annotations to ISO-8601 UTC - tool: mirrors GrafanaAlertRulesTool, reuses GrafanaLogsTool helpers, supports the grafana_backend fixture path, forwards basic-auth credentials - backends: add query_annotations() to the GrafanaBackend Protocol and all implementers - docs: docs/grafana_annotations.mdx registered in docs.json - tests: tests/tools/test_grafana_annotations_tool.py (schema, availability, extraction, backend path, UTC parsing, basic-auth, time-window override)
397246f to
40555c3
Compare
|
@greptile review |
Fixes #2689
Describe the changes you have made in this PR -
OpenSRE could correlate deploys to incidents only for GitHub (
GitDeployTimelineTool, built on the GitHub RESTsince/untilwindow). Any deploy that does not originate from a GitHub push — ArgoCD/Flux syncs,helm upgrade, Jenkins/CircleCI jobs, Terraform applies, manual hotfixes — was invisible to the agent, a concrete RCA blind spot.This adds a read-only Grafana annotations tool so the agent can answer "did a deploy/config change precede this alert?" for deploys from any source. Grafana annotations are the standard, source-agnostic "what changed and when" marker. No new config or dependencies — it reuses the existing Grafana integration auth.
query_annotations(from_ts, to_ts, tags=None, limit=100)onGrafanaClientBase, mirroringquery_alert_rules(): a directrequests.getreturninglist[dict](not_make_request(), since/api/annotationsreturns a JSON array). Module-level_map_annotation/_epoch_ms_to_isomap each item totime/time_end/text/tags/dashboard_uidwith ISO-8601 UTC timestamps.app/tools/GrafanaAnnotationsTool/(query_grafana_annotations), mirroringGrafanaAlertRulesTool: reuses the sharedGrafanaLogsToolhelpers, supports thegrafana_backendfixture path, forwards basic-auth credentials, and accepts atime_range_minuteswindow (default 60, now-anchored) with optional ISOfrom/tooverride.surfacesdefaults to investigation, consistent with the other Grafana tools.query_annotationsadded to theGrafanaBackendProtocol and all implementers (FixtureGrafanaBackend,SelectiveGrafanaBackend,OpenSRECsvGrafanaBackend) so the synthetic path is first-class.docs/grafana_annotations.mdx, registered indocs/docs.json.tests/tools/test_grafana_annotations_tool.pycovers schema/availability/extraction, the backend/fixture path, ISO/UTC parsing, basic-auth credential forwarding, and the time-window override. New tool classified intests/tools/test_telemetry.py.Demo/Screenshot for feature changes and bug fixes -
opensre investigaterun against a DB-auth-failure pipeline alert, with adeploy events_fact v1.4.2 (db credentials rotated)annotation present in Grafana. The agent autonomously callsquery_grafana_annotations, finds the deploy, and reports it as the root cause (Cited Evidence lists "query grafana annotations"):Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
The problem is a change-correlation blind spot: most incidents follow a change, but OpenSRE could only see GitHub-origin deploys. Grafana annotations are the natural, source-agnostic signal teams already emit for deploys/config changes, so reading them lets the agent ask "what changed right before this?" for any deploy mechanism.
I deliberately mirrored the existing Grafana stack rather than inventing new patterns. The service method copies the
query_alert_rules()shape (directrequests.get→list[dict]) instead of_make_request(), because/api/annotationsreturns a JSON array, not an object — using the dict helper would have broken on first call. The tool mirrorsGrafanaAlertRulesTooland reuses the shared credential/availability helpers fromGrafanaLogsTool, so it behaves identically to the other Grafana tools (including basic-auth and the syntheticgrafana_backendfixture path).Key pieces:
query_annotations()performs the windowed, optionally tag-filtered fetch;_map_annotation/_epoch_ms_to_isonormalize Grafana's epoch-ms wire format to readable ISO-8601 UTC;query_grafana_annotationsresolves the client (or uses an injected backend), defaults the window totime_range_minutesending atto(now if unset), and returns{source, available, annotations, total}. Edge cases covered by tests include timezone-naive timestamps (coerced to UTC), ato-only window, malformed timestamps, and basic-auth credential forwarding.Checklist before requesting a review
Note: Please check Allow edits from maintainers if you would like us to assist in the PR.