feat: add optional OpenLit integration for LLM observability#49
feat: add optional OpenLit integration for LLM observability#49sami-marreed wants to merge 1 commit intomainfrom
Conversation
📝 WalkthroughWalkthroughThis PR adds optional OpenLit observability integration to the project. It includes a complete OpenTelemetry deployment stack (via docker-compose) with Grafana, Prometheus, and Tempo for tracing, adds openlit as a runtime dependency, and integrates conditional OpenLit initialization in backend services alongside configuration settings to enable or disable the feature. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/cuga/backend/server/main.py (1)
68-75: Consider moving OpenLit initialization into the lifespan context manager.OpenLit is initialized at module import time, before the FastAPI application starts. This differs from how Langfuse handlers are created (during runtime). Module-level initialization can cause issues if:
- Environment variables (e.g.,
OTEL_EXPORTER_OTLP_ENDPOINT) aren't fully configured yet- OpenLit's
init()has side effects that should occur during proper application startupMoving initialization into the
lifespancontext manager would ensure proper startup sequencing and consistency with other observability setup.💡 Suggested refactor to move initialization into lifespan
Keep the import at module level:
try: import openlit as _openlit except ImportError: _openlit = NoneThen move initialization into the
lifespanfunction (around line 248):`@asynccontextmanager` async def lifespan(app: FastAPI): """Asynchronous context manager for application startup and shutdown.""" logger.info("Application is starting up...") # Initialize OpenLit observability if enabled if _openlit is not None and settings.observability.openlit: _openlit.init() logger.info("OpenLit observability initialized") # ... rest of lifespan code🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/server/main.py` around lines 68 - 75, Move the module-level OpenLit startup into the FastAPI lifespan context: keep the try/except import but assign it to a private symbol (e.g., _openlit) instead of calling init() at import time; then inside the asynccontextmanager lifespan function (lifespan) check if _openlit is not None and settings.observability.openlit and call _openlit.init() and logger.info("OpenLit observability initialized") there so initialization runs during application startup rather than at module import.deployment/docker-compose/openlit/otel-collector-config.yaml (1)
7-9: Addmemory_limiterbeforebatchfor collector stability.
batchalone is fragile under spikes; addmemory_limiterto reduce OOM risk and dropped exports during bursts.Suggested processor chain update
processors: + memory_limiter: + check_interval: 1s + limit_mib: 256 + spike_limit_mib: 64 batch: service: pipelines: traces: receivers: [otlp] - processors: [batch] + processors: [memory_limiter, batch] exporters: [otlp/tempo] metrics: receivers: [otlp] - processors: [batch] + processors: [memory_limiter, batch] exporters: [prometheus]Also applies to: 22-23, 26-27
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deployment/docker-compose/openlit/otel-collector-config.yaml` around lines 7 - 9, The processors section currently only contains the batch processor which is fragile under memory spikes; insert a memory_limiter processor configured before the batch processor (i.e., processors: -> memory_limiter: ... then batch:) so the collector throttles and limits memory usage to reduce OOMs and dropped exports; update every processors block that currently lists only batch (the other occurrences analogous to the one shown) to include memory_limiter before batch and tune its settings appropriately.deployment/docker-compose/openlit/docker-compose.yml (1)
3-3: Pin all service images to explicit versions (or digests).Using floating tags (
latest/ implicit latest) makes the stack non-reproducible and can introduce breaking changes unexpectedly.Also applies to: 27-27, 38-38
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deployment/docker-compose/openlit/docker-compose.yml` at line 3, The docker-compose services currently reference floating image tags (e.g., "otel/opentelemetry-collector-contrib" and the other two image entries noted) which makes deployments non-reproducible; update each service's image field to pin an explicit version tag or immutable digest (for example replace "otel/opentelemetry-collector-contrib" with a specific tag or `@sha256` digest), ensure all three image entries are updated consistently, and verify the chosen tags/digests correspond to tested releases before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deployment/docker-compose/openlit/docker-compose.yml`:
- Around line 43-46: The compose file currently enables anonymous Grafana admin
access via GF_AUTH_ANONYMOUS_ENABLED and GF_AUTH_ANONYMOUS_ORG_ROLE and exposes
Grafana on all interfaces with ports "3000:3000"; change the anonymous role from
Admin to Viewer (or disable GF_AUTH_ANONYMOUS_ENABLED entirely) and restrict the
port binding to localhost (e.g., 127.0.0.1:3000:3000) so Grafana is not
accessible as an unauthenticated admin from the network.
In `@pyproject.toml`:
- Line 45: Update the pinned openlit dependency in pyproject.toml from
"openlit>=1.34.30" to at least "openlit>=1.37.1" to pull in the security fix for
CVE-2026-27941; alternatively, move the openlit requirement into
[project.optional-dependencies] (matching keys like "e2b" and "memory") so it is
not installed by default, and keep the existing guarded import checks in your
code (the import locations where openlit features are conditionally used) to
handle the package being absent. Ensure the version bump or migration is
reflected in any install/test docs and dependency lock files used by the repo.
In `@src/cuga/backend/cuga_graph/utils/controller.py`:
- Around line 36-43: Remove the duplicate OpenLit initialization: in this module
leave the import fallback (the _openlit import and _openlit = None handling) for
potential direct use but delete the conditional block that calls _openlit.init()
and logger.info; instead ensure openlit.init() is invoked once during
application startup in main.py where settings.observability.openlit is checked.
Specifically, remove the code that calls openlit.init() and the
logger.info("OpenLit observability initialized") here while keeping the _openlit
symbol, and consolidate the initialization logic (checking
settings.observability.openlit and calling _openlit.init()) in the main startup
code that runs before any instrumented components like AgentRunner are imported.
---
Nitpick comments:
In `@deployment/docker-compose/openlit/docker-compose.yml`:
- Line 3: The docker-compose services currently reference floating image tags
(e.g., "otel/opentelemetry-collector-contrib" and the other two image entries
noted) which makes deployments non-reproducible; update each service's image
field to pin an explicit version tag or immutable digest (for example replace
"otel/opentelemetry-collector-contrib" with a specific tag or `@sha256` digest),
ensure all three image entries are updated consistently, and verify the chosen
tags/digests correspond to tested releases before committing.
In `@deployment/docker-compose/openlit/otel-collector-config.yaml`:
- Around line 7-9: The processors section currently only contains the batch
processor which is fragile under memory spikes; insert a memory_limiter
processor configured before the batch processor (i.e., processors: ->
memory_limiter: ... then batch:) so the collector throttles and limits memory
usage to reduce OOMs and dropped exports; update every processors block that
currently lists only batch (the other occurrences analogous to the one shown) to
include memory_limiter before batch and tune its settings appropriately.
In `@src/cuga/backend/server/main.py`:
- Around line 68-75: Move the module-level OpenLit startup into the FastAPI
lifespan context: keep the try/except import but assign it to a private symbol
(e.g., _openlit) instead of calling init() at import time; then inside the
asynccontextmanager lifespan function (lifespan) check if _openlit is not None
and settings.observability.openlit and call _openlit.init() and
logger.info("OpenLit observability initialized") there so initialization runs
during application startup rather than at module import.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
deployment/docker-compose/openlit/docker-compose.ymldeployment/docker-compose/openlit/grafana-datasources.yamldeployment/docker-compose/openlit/otel-collector-config.yamldeployment/docker-compose/openlit/prometheus.ymldeployment/docker-compose/openlit/tempo.yamlpyproject.tomlsrc/cuga/backend/cuga_graph/utils/controller.pysrc/cuga/backend/server/main.pysrc/cuga/config.pysrc/cuga/settings.toml
| - GF_AUTH_ANONYMOUS_ENABLED=true | ||
| - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin | ||
| ports: | ||
| - "3000:3000" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n deployment/docker-compose/openlit/docker-compose.ymlRepository: cuga-project/cuga-agent
Length of output: 1780
🏁 Script executed:
rg -i "grafana|anonymous|auth" deployment/docker-compose/openlit/ -B 2 -A 2Repository: cuga-project/cuga-agent
Length of output: 1404
🌐 Web query:
Grafana GF_AUTH_ANONYMOUS_ENABLED GF_AUTH_ANONYMOUS_ORG_ROLE Admin security implications
💡 Result:
Enabling anonymous auth (GF_AUTH_ANONYMOUS_ENABLED=true) means anyone who can reach your Grafana URL can use Grafana without logging in, with whatever org role you assign via GF_AUTH_ANONYMOUS_ORG_ROLE. Grafana explicitly allows Viewer, Editor, or Admin for anonymous users. [1]
If you set GF_AUTH_ANONYMOUS_ORG_ROLE=Admin, the security implication is effectively: “unauthenticated org admin”.
What an anonymous Org Admin can typically do (high risk)
- Add/edit/delete data sources (Org Admin is required to add data sources). This can be used to point Grafana at sensitive systems and then query them. [2]
- Edit resources broadly within the org: Org Admin is the top “basic role” inside an org (above Editor/Viewer) and is intended to administer org resources. [3]
- Full alerting write access (create/modify alert rules and notification resources). [4]
- All the risks already present with anonymous “Viewer” access, but worse, e.g. anonymous users can make API “view” calls that enumerate folders/dashboards/data sources, and can make arbitrary queries to data sources accessible in the org. With
Admin, they can additionally change configuration/resources rather than just read/query. [5]
Bottom line
Setting anonymous + Admin is generally equivalent to running an admin console with no authentication. Only consider it in tightly controlled environments (e.g., isolated network + additional gateway controls). Grafana’s docs recommend anonymous access primarily for sharing, and note “Public dashboards” as a more secure sharing approach. [1]
Sources
[1] Grafana Docs — Configure anonymous access (grafana.com)
[2] Grafana Docs — Data source management (grafana.com)
[3] Grafana Docs — Roles and permissions (grafana.com)
[4] Grafana Docs — Configure roles and permissions (Alerting) (grafana.com)
[5] Grafana Docs — Configure security (implications of anonymous access; arbitrary data source queries; API enumeration) (grafana.com)
Citations:
- 1: https://grafana.com/docs/grafana/latest//setup-grafana/configure-security/configure-authentication/anonymous-auth/?utm_source=openai
- 2: https://grafana.com/docs/grafana/latest/permissions/datasource_permissions/?utm_source=openai
- 3: https://grafana.com/docs/grafana/latest/administration/roles-and-permissions/?utm_source=openai
- 4: https://grafana.com/docs/grafana/latest/alerting/set-up/configure-roles/?utm_source=openai
- 5: https://grafana.com/docs/grafana/latest/setup-grafana/configure-security/?utm_source=openai
Unauthenticated Grafana admin access on an exposed port is a security risk.
This configuration grants unauthenticated admin access to Grafana. An anonymous Org Admin can add/edit/delete data sources, modify resources, and manage alerts. The port binding 3000:3000 exposes Grafana to all network interfaces by default. For local development, downgrade the role to Viewer and bind to localhost.
Safer local defaults
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
+ - GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
ports:
- - "3000:3000"
+ - "127.0.0.1:3000:3000"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - GF_AUTH_ANONYMOUS_ENABLED=true | |
| - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin | |
| ports: | |
| - "3000:3000" | |
| - GF_AUTH_ANONYMOUS_ENABLED=true | |
| - GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer | |
| ports: | |
| - "127.0.0.1:3000:3000" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deployment/docker-compose/openlit/docker-compose.yml` around lines 43 - 46,
The compose file currently enables anonymous Grafana admin access via
GF_AUTH_ANONYMOUS_ENABLED and GF_AUTH_ANONYMOUS_ORG_ROLE and exposes Grafana on
all interfaces with ports "3000:3000"; change the anonymous role from Admin to
Viewer (or disable GF_AUTH_ANONYMOUS_ENABLED entirely) and restrict the port
binding to localhost (e.g., 127.0.0.1:3000:3000) so Grafana is not accessible as
an unauthenticated admin from the network.
| "asyncpg>=0.30", | ||
| "PyJWT[crypto]>=2.10.0", | ||
| "cryptography>=44.0.0", | ||
| "openlit>=1.34.30", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "pyproject.toml" | head -5Repository: cuga-project/cuga-agent
Length of output: 332
🏁 Script executed:
#!/bin/bash
# Read pyproject.toml to verify the dependency structure
cat pyproject.tomlRepository: cuga-project/cuga-agent
Length of output: 3223
🏁 Script executed:
#!/bin/bash
# Check for settings.toml to verify the optional feature claim
fd settings.tomlRepository: cuga-project/cuga-agent
Length of output: 89
🏁 Script executed:
#!/bin/bash
# Search for openlit imports to verify if they are guarded
rg "openlit" -A 2 -B 2Repository: cuga-project/cuga-agent
Length of output: 4966
🌐 Web query:
openlit package security advisories vulnerabilities 2024 2025
💡 Result:
openlit (PyPI package) — security advisories / vulnerabilities (2024–2025)
- No published vulnerabilities/advisories found for
openlitin 2024 or 2025 in common public vulnerability aggregation sources I checked (e.g., Safety CLI’s vulnerability DB lists “No known vulnerabilities found” for the PyPI package). [1]
Related / newer finding (outside your 2024–2025 window)
- A critical GitHub Actions workflow vulnerability was disclosed later: GHSA-9jgv-x8cq-296q, published Feb 25, 2026, and assigned CVE-2026-27941. It affects OpenLIT prior to 1.37.1 (fix in 1.37.1) and involves unsafe use of
pull_request_targetthat can enable RCE/secret exposure in CI. [2][3]
Current package status (context)
- PyPI shows ongoing releases into 2026 (e.g.,
1.36.8on Feb 6, 2026). [4]
Sources:
[1] Safety CLI DB entry for openlit (PyPI) https://data.safetycli.com/packages/pypi/openlit/
[2] GitHub Security Advisory GHSA-9jgv-x8cq-296q GHSA-9jgv-x8cq-296q
[3] NVD entry for CVE-2026-27941 https://nvd.nist.gov/vuln/detail/CVE-2026-27941
[4] PyPI project page for openlit (release history) https://pypi.org/project/openlit/
🏁 Script executed:
#!/bin/bash
# Check if there are any initialization or usage of openlit beyond the guarded imports
rg "_openlit|openlit\." -B 1 -A 1 | head -50Repository: cuga-project/cuga-agent
Length of output: 890
Upgrade openlit to 1.37.1 or later—current version has a critical security vulnerability.
The pinned version 1.34.30 is affected by CVE-2026-27941, a critical vulnerability involving unsafe use of pull_request_target in GitHub Actions workflows that can enable RCE and secret exposure. This was patched in 1.37.1 (released Feb 2026). Update to a patched version immediately.
Additionally, consider moving openlit to [project.optional-dependencies] for consistency with other optional features like e2b and memory. Since the feature is disabled by default in settings.toml and the code uses guarded imports, making this an optional dependency would reduce the installation footprint for users who don't need observability. The guarded import pattern would become essential rather than redundant if the package is not guaranteed to be present.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyproject.toml` at line 45, Update the pinned openlit dependency in
pyproject.toml from "openlit>=1.34.30" to at least "openlit>=1.37.1" to pull in
the security fix for CVE-2026-27941; alternatively, move the openlit requirement
into [project.optional-dependencies] (matching keys like "e2b" and "memory") so
it is not installed by default, and keep the existing guarded import checks in
your code (the import locations where openlit features are conditionally used)
to handle the package being absent. Ensure the version bump or migration is
reflected in any install/test docs and dependency lock files used by the repo.
| try: | ||
| import openlit as _openlit | ||
| except ImportError: | ||
| _openlit = None | ||
|
|
||
| if _openlit is not None and settings.observability.openlit: | ||
| _openlit.init() | ||
| logger.info("OpenLit observability initialized") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the main.py file mentioned in the review
find . -name "main.py" -path "*/server/*" | head -20Repository: cuga-project/cuga-agent
Length of output: 100
🏁 Script executed:
# Check the contents of main.py to see if it has OpenLit initialization
cat -n src/cuga/backend/server/main.py 2>/dev/null || echo "File not found at specified path"Repository: cuga-project/cuga-agent
Length of output: 50380
🏁 Script executed:
# Verify import relationships - check if controller.py is imported by main.py
rg -n "controller" src/cuga/backend/server/main.py || echo "No controller imports found in main.py"Repository: cuga-project/cuga-agent
Length of output: 134
🌐 Web query:
OpenLit Python SDK initialize multiple times openlit.init() side effects
💡 Result:
openlit.init() is meant to be a one-time, process-wide initialization that turns on OpenTelemetry auto-instrumentation by monkey‑patching supported libraries at init time. Because of that, calling it multiple times can have real side effects rather than being a harmless “reconfigure” call. [1] [2]
Likely side effects of calling openlit.init() multiple times
- Duplicate instrumentation / double-wrapping: since OpenLIT patches libraries during
init(), repeated calls can re-wrap already-wrapped functions, which often shows up as duplicate spans/events, deeper call stacks, and extra overhead. [2] - Multiple processors/exporters added: OpenLIT’s init path configures OpenTelemetry exporting/processing (and has been described as using a relatively “naive” exporter/processor configuration approach keyed off OTEL env vars like
OTEL_EXPORTER_OTLP_ENDPOINT). Re-running init risks adding additional processors/exporters (leading to duplicated exports) or creating confusing mixed configuration. [3]
Practical guidance
- Treat
openlit.init()as idempotence-unsafe and call it exactly once, as early as possible (before creating/importing the client objects you want instrumented). [1] [2] - If you have an app that can execute startup code multiple times (tests, notebooks, dev server reload), add your own guard (e.g., a module-level
initializedflag) so you don’t callopenlit.init()again.
Sources: OpenLIT docs on manual instrumentation via openlit.init() [1], troubleshooting note that OpenLIT patches at init time [2], OpenLIT issue discussing init’s processor/exporter configuration approach [3].
Remove duplicate OpenLit initialization in this file.
OpenLit is initialized in both this file and src/cuga/backend/server/main.py. Since main.py imports AgentRunner from this module at module load time, both initializations will execute. Calling openlit.init() multiple times causes duplicate instrumentation, duplicate spans/events, and additional overhead—the OpenLit SDK is not idempotence-safe.
Keep only the import in this file (for potential direct usage elsewhere) and consolidate initialization to a single location in main.py, preferably during application startup (before any instrumented code runs).
🐛 Suggested fix: Remove duplicate initialization
try:
import openlit as _openlit
except ImportError:
_openlit = None
-
-if _openlit is not None and settings.observability.openlit:
- _openlit.init()
- logger.info("OpenLit observability initialized")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| import openlit as _openlit | |
| except ImportError: | |
| _openlit = None | |
| if _openlit is not None and settings.observability.openlit: | |
| _openlit.init() | |
| logger.info("OpenLit observability initialized") | |
| try: | |
| import openlit as _openlit | |
| except ImportError: | |
| _openlit = None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cuga/backend/cuga_graph/utils/controller.py` around lines 36 - 43, Remove
the duplicate OpenLit initialization: in this module leave the import fallback
(the _openlit import and _openlit = None handling) for potential direct use but
delete the conditional block that calls _openlit.init() and logger.info; instead
ensure openlit.init() is invoked once during application startup in main.py
where settings.observability.openlit is checked. Specifically, remove the code
that calls openlit.init() and the logger.info("OpenLit observability
initialized") here while keeping the _openlit symbol, and consolidate the
initialization logic (checking settings.observability.openlit and calling
_openlit.init()) in the main startup code that runs before any instrumented
components like AgentRunner are imported.
offerakrabi
left a comment
There was a problem hiding this comment.
a few issues i see
- you did not enable openlit for the sdk
- no agent id or session id sent to openlit
Feature Pull Request
Related Issue
Closes #
Description
Adds optional OpenLit integration for LLM observability, controlled via
[observability] openlit = truein settings.toml. When enabled, OpenLit instruments Cuga's LLM calls and emits traces/metrics via OpenTelemetry (OTLP), allowing visualization in Grafana, Tempo, and Prometheus.Key changes:
[observability]section in settings.toml withopenlit = false(default)deployment/docker-compose/openlit/: OTel Collector + Tempo + Prometheus + GrafanaOTEL_EXPORTER_OTLP_ENDPOINT, etc.)Type of Changes
Testing
Documentation
Checklist
Summary by CodeRabbit
observability.openlitconfiguration option to enable monitoring (disabled by default).