Skip to content

Commit 659a0ca

Browse files
committed
feat(docs): add distributed tracing documentation and update navigation
1 parent d6930ea commit 659a0ca

File tree

9 files changed

+249
-9
lines changed

9 files changed

+249
-9
lines changed

docs/docs.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,12 +42,13 @@
4242
]
4343
},
4444
{
45-
"group": "Task Observability",
45+
"group": "Observability",
4646
"icon": "radar",
4747
"pages": [
4848
"observe/agent_tasks",
4949
"observe/buffer",
50-
"observe/dashboard"
50+
"observe/dashboard",
51+
"observe/traces"
5152
]
5253
},
5354
{

docs/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Short-term memory
2020
Store conversations and artifacts with text, images, and files across sessions
2121
</Card>
2222

23-
<Card title="Task Observability" icon="eye">
23+
<Card title="Observability" icon="eye">
2424
Mid-term memory
2525

2626
Monitor what your agent plans vs. what it actually executes

docs/integrations/agno.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -306,7 +306,7 @@ Acontext will automatically flush the buffer when the buffer is full or IDLE. To
306306
## Next Steps
307307

308308
<CardGroup cols={2}>
309-
<Card title="Task Observability" icon="eye" href="/observe/agent_tasks">
309+
<Card title="Observability" icon="eye" href="/observe/agent_tasks">
310310
Monitor what your agent plans vs. what it executes
311311
</Card>
312312

docs/integrations/ai-sdk.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -615,7 +615,7 @@ Message content must be a string, not an array. Array content needs to be conver
615615
## Next Steps
616616

617617
<CardGroup cols={2}>
618-
<Card title="Task Observability" icon="eye" href="/observe/agent_tasks">
618+
<Card title="Observability" icon="eye" href="/observe/agent_tasks">
619619
Monitor what your agent plans vs. what it executes
620620
</Card>
621621

docs/integrations/openai-python.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,7 @@ Acontext will automatically flush the buffer when the buffer is full or IDLE. To
439439
## Next Steps
440440

441441
<CardGroup cols={2}>
442-
<Card title="Task Observability" icon="eye" href="/observe/agent_tasks">
442+
<Card title="Observability" icon="eye" href="/observe/agent_tasks">
443443
Monitor what your agent plans vs. what it executes
444444
</Card>
445445

docs/integrations/openai-typescript.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -522,7 +522,7 @@ Acontext will automatically flush the buffer when the buffer is full or IDLE. To
522522
## Next Steps
523523

524524
<CardGroup cols={2}>
525-
<Card title="Task Observability" icon="eye" href="/observe/agent_tasks">
525+
<Card title="Observability" icon="eye" href="/observe/agent_tasks">
526526
Monitor what your agent plans vs. what it executes
527527
</Card>
528528

docs/integrations/openai_agent.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,7 @@ Tools are defined using the `@function_tool` decorator, which automatically regi
458458
## Next Steps
459459

460460
<CardGroup cols={2}>
461-
<Card title="Task Observability" icon="eye" href="/observe/agent_tasks">
461+
<Card title="Observability" icon="eye" href="/observe/agent_tasks">
462462
Monitor what your agent plans vs. what it executes
463463
</Card>
464464

docs/observe/dashboard.mdx

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The Acontext dashboard provides a unified interface for monitoring and analyzing
77

88
## Overview
99

10-
The dashboard gives you complete visibility into your agent's operations through five specialized views. Each view is designed to help you monitor different aspects of your agent's performance and debug issues quickly.
10+
The dashboard gives you complete visibility into your agent's operations through six specialized views. Each view is designed to help you monitor different aspects of your agent's performance and debug issues quickly.
1111

1212
<Tip>
1313
Use the dashboard filters to narrow down specific time ranges or search for particular operations when troubleshooting issues.
@@ -31,6 +31,33 @@ The BI dashboard includes:
3131
Export BI dashboard data to integrate with your existing analytics tools or create custom reports for stakeholders.
3232
</Tip>
3333

34+
## Traces Viewer
35+
36+
Monitor distributed traces across your entire system using OpenTelemetry integration.
37+
The traces viewer provides detailed visibility into request flows, service interactions, and performance bottlenecks across acontext-api and acontext-core services.
38+
39+
<Frame caption="Traces viewer displaying distributed traces with hierarchical span visualization, timing information, and service interactions">
40+
<img src="/images/dashboard/traces_viewer.png" alt="Traces viewer interface showing a list of traces with expandable span details, color-coded service indicators, HTTP method badges, duration bars, and trace IDs" />
41+
</Frame>
42+
43+
The traces viewer includes:
44+
- **Time range filtering**: Filter traces by time ranges (15 minutes, 1 hour, 6 hours, 24 hours, or 7 days)
45+
- **Auto-refresh**: Automatically refreshes trace data every 30 seconds to keep information up-to-date
46+
- **Hierarchical span visualization**: Expand traces to view nested spans showing the complete request flow
47+
- **Service identification**: Color-coded spans distinguish between acontext-api (teal) and acontext-core (blue) services
48+
- **HTTP method badges**: Quickly identify request types with color-coded HTTP method indicators
49+
- **Duration visualization**: Visual timeline bars show relative execution times for each span
50+
- **Jaeger integration**: Click the external link icon to view detailed trace information in Jaeger UI
51+
- **Pagination**: Load more traces to explore historical data beyond the initial results
52+
53+
<Tip>
54+
Use the trace ID to correlate issues across logs and metrics. Click the trace ID to copy it to your clipboard for easy reference.
55+
</Tip>
56+
57+
<Info>
58+
Traces are automatically collected when OpenTelemetry is enabled in your Acontext deployment. The traces viewer integrates with Jaeger for trace storage and visualization.
59+
</Info>
60+
3461
## Message Viewer
3562

3663
Examine all messages exchanged between your agent and external systems.

docs/observe/traces.mdx

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
---
2+
title: "Distributed Tracing"
3+
description: "Monitor request flows across services with OpenTelemetry integration for performance debugging and system observability"
4+
---
5+
6+
Acontext now includes comprehensive distributed tracing support through OpenTelemetry integration. This enables you to track requests as they flow through your entire system, from API endpoints through core services, database operations, and external service calls.
7+
8+
## Overview
9+
10+
Distributed tracing provides end-to-end visibility into how requests are processed across multiple services. When a request comes in, Acontext automatically creates a trace that follows the request through:
11+
12+
- **acontext-api**: HTTP API layer (Go service)
13+
- **acontext-core**: Core business logic (Python service)
14+
- **Database operations**: SQL queries and transactions
15+
- **Cache operations**: Redis interactions
16+
- **Storage operations**: S3 blob storage
17+
- **Message queue**: RabbitMQ message processing
18+
- **LLM operations**: Embedding and completion calls
19+
20+
<Info>
21+
Traces are automatically collected when OpenTelemetry is enabled in your deployment. The system uses Jaeger as the trace backend for storage and visualization.
22+
</Info>
23+
24+
## How It Works
25+
26+
Acontext uses OpenTelemetry to instrument both the API and Core services:
27+
28+
### Automatic Instrumentation
29+
30+
The following operations are automatically traced:
31+
32+
- **HTTP requests**: All API endpoints are instrumented with request/response details
33+
- **Database queries**: SQL operations are traced with query details
34+
- **Cache operations**: Redis get/set operations
35+
- **Storage operations**: S3 upload/download operations
36+
- **Message processing**: Async message queue operations
37+
- **LLM calls**: Embedding and completion API calls
38+
39+
### Cross-Service Tracing
40+
41+
When a request flows from `acontext-api` to `acontext-core`, the trace context is automatically propagated using OpenTelemetry's trace context headers. This creates a unified trace showing the complete request flow across both services.
42+
43+
<Frame caption="Traces viewer showing distributed traces with hierarchical span visualization">
44+
<img src="/images/dashboard/traces_viewer.png" alt="Traces viewer interface displaying traces with expandable spans, color-coded services, HTTP method badges, and duration visualization" />
45+
</Frame>
46+
47+
## Viewing Traces
48+
49+
### Dashboard Traces Viewer
50+
51+
Access the traces viewer from the dashboard to see all traces in your system:
52+
53+
- **Time range filtering**: Filter traces by time ranges (15 minutes, 1 hour, 6 hours, 24 hours, or 7 days)
54+
- **Auto-refresh**: Automatically refreshes every 30 seconds
55+
- **Hierarchical visualization**: Expand traces to view nested spans showing the complete request flow
56+
- **Service identification**: Color-coded spans distinguish between services (acontext-api in teal, acontext-core in blue)
57+
- **HTTP method badges**: Quickly identify request types
58+
- **Duration visualization**: Visual timeline bars show relative execution times
59+
- **Trace ID**: Copy trace IDs to correlate with logs and metrics
60+
61+
<Tip>
62+
Click the external link icon next to a trace ID to open the detailed trace view in Jaeger UI for advanced analysis.
63+
</Tip>
64+
65+
### Jaeger UI
66+
67+
For advanced trace analysis, you can access Jaeger UI directly. The traces viewer provides a link to open each trace in Jaeger, where you can:
68+
69+
- View detailed span attributes and tags
70+
- Analyze trace dependencies and service maps
71+
- Filter and search traces by various criteria
72+
- Compare trace performance over time
73+
74+
## Configuration
75+
76+
Tracing is configured through environment variables. The following settings control tracing behavior:
77+
78+
### Core Service (Python)
79+
80+
```bash
81+
# Enable/disable tracing
82+
TELEMETRY_ENABLED=true
83+
84+
# OTLP endpoint (Jaeger collector)
85+
TELEMETRY_OTLP_ENDPOINT=http://localhost:4317
86+
87+
# Sampling ratio (0.0-1.0, default 1.0 = 100% sampling)
88+
TELEMETRY_SAMPLE_RATIO=1.0
89+
90+
# Service name for tracing
91+
TELEMETRY_SERVICE_NAME=acontext-core
92+
```
93+
94+
### API Service (Go)
95+
96+
```yaml
97+
telemetry:
98+
enabled: true
99+
otlp_endpoint: "localhost:4317"
100+
sample_ratio: 1.0
101+
```
102+
103+
<Warning>
104+
In production environments, consider using a sampling ratio less than 1.0 (e.g., 0.1 for 10% sampling) to reduce storage costs and overhead while still capturing representative traces.
105+
</Warning>
106+
107+
## Understanding Traces
108+
109+
### Trace Structure
110+
111+
Each trace consists of:
112+
113+
- **Root span**: The initial request entry point (usually an HTTP endpoint)
114+
- **Child spans**: Operations performed during request processing
115+
- **Nested spans**: Operations that are part of larger operations
116+
117+
### Span Information
118+
119+
Each span contains:
120+
121+
- **Operation name**: The operation being performed (e.g., `GET /api/v1/session/:session_id/get_learning_status`)
122+
- **Service name**: Which service performed the operation (`acontext-api` or `acontext-core`)
123+
- **Duration**: How long the operation took
124+
- **Tags**: Additional metadata (HTTP method, status codes, error information)
125+
- **Timestamps**: When the operation started and ended
126+
127+
### Service Colors
128+
129+
In the traces viewer, spans are color-coded by service:
130+
131+
- **Teal**: `acontext-api` operations
132+
- **Blue**: `acontext-core` operations
133+
- **Gray**: Other services or unknown operations
134+
135+
## Use Cases
136+
137+
<AccordionGroup>
138+
<Accordion title="Performance debugging">
139+
Identify slow operations and bottlenecks in your system by analyzing trace durations. Expand traces to see which specific operation is taking the most time.
140+
141+
```python
142+
# Traces automatically show up in the dashboard
143+
# No code changes needed - just enable tracing in your configuration
144+
```
145+
146+
1. Open the traces viewer in the dashboard
147+
2. Filter by time range to focus on recent requests
148+
3. Look for traces with long durations
149+
4. Expand the trace to see which span is slow
150+
5. Check the operation name and service to identify the bottleneck
151+
</Accordion>
152+
153+
<Accordion title="Error investigation">
154+
When an error occurs, use the trace ID to correlate logs and understand the full request flow that led to the error.
155+
156+
1. Find the error in your logs and note the trace ID
157+
2. Search for the trace ID in the traces viewer
158+
3. Expand the trace to see the complete request flow
159+
4. Identify which service and operation failed
160+
5. Check span tags for error details
161+
</Accordion>
162+
163+
<Accordion title="Service dependency analysis">
164+
Understand how your services interact by analyzing trace flows. See which services call which other services and how frequently.
165+
166+
1. View traces in Jaeger UI for advanced analysis
167+
2. Use Jaeger's service map view to visualize dependencies
168+
3. Analyze trace patterns to understand service communication
169+
</Accordion>
170+
171+
<Accordion title="Performance optimization">
172+
Compare trace durations before and after optimizations to measure improvements.
173+
174+
1. Note trace durations for specific operations before optimization
175+
2. Make your optimizations
176+
3. Compare new trace durations to verify improvements
177+
4. Use trace data to identify the next optimization target
178+
</Accordion>
179+
</AccordionGroup>
180+
181+
## Best Practices
182+
183+
<CardGroup cols={2}>
184+
<Card title="Use sampling in production" icon="chart-line">
185+
Configure a sampling ratio (e.g., 0.1 for 10%) to reduce storage costs while maintaining observability.
186+
</Card>
187+
188+
<Card title="Correlate with logs" icon="link">
189+
Use trace IDs from traces to find related log entries and get complete context for debugging.
190+
</Card>
191+
192+
<Card title="Monitor trace volume" icon="eye">
193+
Watch trace collection rates to ensure your sampling ratio is appropriate for your traffic volume.
194+
</Card>
195+
196+
<Card title="Set up alerts" icon="bell">
197+
Configure alerts based on trace durations to catch performance regressions early.
198+
</Card>
199+
</CardGroup>
200+
201+
## Next Steps
202+
203+
<CardGroup cols={2}>
204+
<Card title="Dashboard" icon="chart-simple" href="/observe/dashboard">
205+
View traces alongside other observability data in the unified dashboard.
206+
</Card>
207+
208+
<Card title="Settings" icon="gear" href="/settings/runtime">
209+
Configure tracing settings and sampling ratios for your deployment.
210+
</Card>
211+
</CardGroup>
212+

0 commit comments

Comments
 (0)