Skip to content

feat(grafana): add deployment-annotations tool for change correlation (#2689)#2702

Open
kespineira wants to merge 1 commit into
Tracer-Cloud:mainfrom
kespineira:issue/2689-grafana-deployment-annotations-tool
Open

feat(grafana): add deployment-annotations tool for change correlation (#2689)#2702
kespineira wants to merge 1 commit into
Tracer-Cloud:mainfrom
kespineira:issue/2689-grafana-deployment-annotations-tool

Conversation

@kespineira

@kespineira kespineira commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Fixes #2689

Describe the changes you have made in this PR -

OpenSRE could correlate deploys to incidents only for GitHub (GitDeployTimelineTool, built on the GitHub REST since/until window). Any deploy that does not originate from a GitHub push — ArgoCD/Flux syncs, helm upgrade, Jenkins/CircleCI jobs, Terraform applies, manual hotfixes — was invisible to the agent, a concrete RCA blind spot.

This adds a read-only Grafana annotations tool so the agent can answer "did a deploy/config change precede this alert?" for deploys from any source. Grafana annotations are the standard, source-agnostic "what changed and when" marker. No new config or dependencies — it reuses the existing Grafana integration auth.

  • Service layerquery_annotations(from_ts, to_ts, tags=None, limit=100) on GrafanaClientBase, mirroring query_alert_rules(): a direct requests.get returning list[dict] (not _make_request(), since /api/annotations returns a JSON array). Module-level _map_annotation / _epoch_ms_to_iso map each item to time / time_end / text / tags / dashboard_uid with ISO-8601 UTC timestamps.
  • Tool layerapp/tools/GrafanaAnnotationsTool/ (query_grafana_annotations), mirroring GrafanaAlertRulesTool: reuses the shared GrafanaLogsTool helpers, supports the grafana_backend fixture path, forwards basic-auth credentials, and accepts a time_range_minutes window (default 60, now-anchored) with optional ISO from/to override. surfaces defaults to investigation, consistent with the other Grafana tools.
  • Fixture backendsquery_annotations added to the GrafanaBackend Protocol and all implementers (FixtureGrafanaBackend, SelectiveGrafanaBackend, OpenSRECsvGrafanaBackend) so the synthetic path is first-class.
  • Docsdocs/grafana_annotations.mdx, registered in docs/docs.json.
  • Teststests/tools/test_grafana_annotations_tool.py covers schema/availability/extraction, the backend/fixture path, ISO/UTC parsing, basic-auth credential forwarding, and the time-window override. New tool classified in tests/tools/test_telemetry.py.

Demo/Screenshot for feature changes and bug fixes -

grafana_annotations_demo

opensre investigate run against a DB-auth-failure pipeline alert, with a deploy events_fact v1.4.2 (db credentials rotated) annotation present in Grafana. The agent autonomously calls query_grafana_annotations, finds the deploy, and reports it as the root cause (Cited Evidence lists "query grafana annotations"):

The events_fact pipeline v1.4.2 was deployed with a database credential rotation that was not propagated to the pipeline's runtime configuration, so the load step authenticated with the now-revoked credential — extraction succeeded (128 rows) but the write failed with an authentication error.


Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

The problem is a change-correlation blind spot: most incidents follow a change, but OpenSRE could only see GitHub-origin deploys. Grafana annotations are the natural, source-agnostic signal teams already emit for deploys/config changes, so reading them lets the agent ask "what changed right before this?" for any deploy mechanism.

I deliberately mirrored the existing Grafana stack rather than inventing new patterns. The service method copies the query_alert_rules() shape (direct requests.getlist[dict]) instead of _make_request(), because /api/annotations returns a JSON array, not an object — using the dict helper would have broken on first call. The tool mirrors GrafanaAlertRulesTool and reuses the shared credential/availability helpers from GrafanaLogsTool, so it behaves identically to the other Grafana tools (including basic-auth and the synthetic grafana_backend fixture path).

Key pieces: query_annotations() performs the windowed, optionally tag-filtered fetch; _map_annotation/_epoch_ms_to_iso normalize Grafana's epoch-ms wire format to readable ISO-8601 UTC; query_grafana_annotations resolves the client (or uses an injected backend), defaults the window to time_range_minutes ending at to (now if unset), and returns {source, available, annotations, total}. Edge cases covered by tests include timezone-naive timestamps (coerced to UTC), a to-only window, malformed timestamps, and basic-auth credential forwarding.


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@kespineira kespineira marked this pull request as ready for review June 2, 2026 16:06
@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a query_grafana_annotations tool that lets the OpenSRE agent correlate incidents with deploys or config changes from any source (ArgoCD, Helm, Terraform, etc.), closing the blind spot that previously existed for non-GitHub-originated deploys. It closely mirrors the existing Grafana tool stack: GrafanaClientBase.query_annotations performs a windowed /api/annotations fetch, and the tool layer reuses the shared credential/availability helpers from GrafanaLogsTool.

  • Service layer (app/services/grafana/base.py): Adds query_annotations, _map_annotation, and _epoch_ms_to_iso; uses a direct requests.get (correctly bypassing _make_request) because the endpoint returns a JSON array, not an object.
  • Tool layer (app/tools/GrafanaAnnotationsTool/__init__.py): Implements time-window resolution with ISO override support, UTC-safe timestamp parsing, and credential forwarding; all four existing backends and the GrafanaBackend Protocol are updated with a stub.
  • Tests (tests/tools/test_grafana_annotations_tool.py): Covers wire-format mapping, basic-auth forwarding, to-only window anchoring, tag filtering, and the naive-UTC edge case.

Confidence Score: 5/5

Safe to merge; the new tool is read-only, reuses existing auth, and is well-tested.

The change is additive and read-only — no mutations, no schema migrations, no new credentials. The implementation faithfully mirrors established Grafana tool patterns, the time-window and credential-forwarding logic is tested exhaustively, and all four backend implementors are kept in sync. The only notable gap is a minor response-shape inconsistency between the fixture and HTTP paths.

app/tools/GrafanaAnnotationsTool/init.py — the backend and HTTP paths return slightly different response shapes; worth aligning before the response contract solidifies.

Important Files Changed

Filename Overview
app/tools/GrafanaAnnotationsTool/init.py New tool correctly mirrors GrafanaAlertRulesTool patterns; ISO timestamp parsing, credential forwarding, and time-window anchoring all look correct. Minor response shape inconsistency between the backend and HTTP paths (backend omits from, to, tags_filter).
app/services/grafana/base.py Adds query_annotations, _map_annotation, and _epoch_ms_to_iso to the Grafana base client; direct requests.get (bypassing _make_request) is the correct approach for an array-returning endpoint. Error handling and auth header reuse are consistent with existing methods.
tests/tools/test_grafana_annotations_tool.py Comprehensive test coverage: schema contract, availability, backend wire-format mapping, ISO/UTC parsing, basic-auth forwarding, time-window anchoring, tags forwarding, and the to-only window edge case all tested. No gaps observed.
tests/synthetic/mock_grafana_backend/backend.py Protocol and FixtureGrafanaBackend updated consistently; query_annotations stubs return [] with a clear comment explaining why (no deploy annotations in RDS scenarios).
app/integrations/opensre/csv_grafana_backend.py Adds a no-op query_annotations stub to OpenSRECsvGrafanaBackend; correct since CSV telemetry carries no deploy/config-change annotations.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Tool as query_grafana_annotations
    participant Client as GrafanaClientBase
    participant Grafana as Grafana /api/annotations

    Agent->>Tool: call(tags, time_range_minutes, from, to)
    alt grafana_backend injected (test/fixture path)
        Tool->>Tool: grafana_backend.query_annotations(tags, limit)
        Tool-->>Agent: "{available, annotations, total, raw}"
    else HTTP client path (production)
        Tool->>Tool: resolve time window (ISO to epoch ms)
        Tool->>Client: query_annotations(from_ts, to_ts, tags, limit)
        Client->>Grafana: "GET /api/annotations?from&to&type=annotation&tags"
        Grafana-->>Client: JSON array of raw annotation objects
        Client->>Client: _map_annotation() to ISO timestamps
        Client-->>Tool: list[dict]
        Tool-->>Agent: "{available, annotations, total, tags_filter, from, to}"
    end
Loading

Reviews (2): Last reviewed commit: "feat(grafana): add deployment-annotation..." | Re-trigger Greptile

Comment thread app/tools/GrafanaAnnotationsTool/__init__.py
Comment thread app/tools/GrafanaAnnotationsTool/__init__.py
…Tracer-Cloud#2689)

Add a read-only GrafanaAnnotationsTool plus a query_annotations() method on
GrafanaClientBase so the agent can answer "did a deploy/config change precede this
alert?" for deploys from any source (ArgoCD/Flux, Helm, Terraform, manual), not just
GitHub pushes. Complements GitDeployTimelineTool and reuses the existing Grafana auth.

- service: query_annotations() mirrors query_alert_rules() (direct requests.get ->
  list[dict]); _map_annotation/_epoch_ms_to_iso map /api/annotations to ISO-8601 UTC
- tool: mirrors GrafanaAlertRulesTool, reuses GrafanaLogsTool helpers, supports the
  grafana_backend fixture path, forwards basic-auth credentials
- backends: add query_annotations() to the GrafanaBackend Protocol and all implementers
- docs: docs/grafana_annotations.mdx registered in docs.json
- tests: tests/tools/test_grafana_annotations_tool.py (schema, availability, extraction,
  backend path, UTC parsing, basic-auth, time-window override)
@kespineira kespineira force-pushed the issue/2689-grafana-deployment-annotations-tool branch from 397246f to 40555c3 Compare June 2, 2026 16:19
@kespineira

Copy link
Copy Markdown
Contributor Author

@greptile review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add Grafana deployment-annotations tool for change correlation

1 participant