Skip to content

Commit 74851e6

Browse files
Configure Observability for AgentCore Gateway with Amazon CloudWatch and AWS CloudTrail (#518)
* Committing observability samples for AgentCore Gateway * minor changes to readme * minor changes to readme
1 parent 29abcb2 commit 74851e6

34 files changed

+2598
-0
lines changed

01-tutorials/02-AgentCore-gateway/06-gateway-observability/01-gateway-observability.ipynb

Lines changed: 2433 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
AgentCore Gateway Observability Tutorial
2+
# Configure Observability for AgentCore Gateway with Amazon CloudWatch and AWS CloudTrail
3+
4+
## Overview
5+
6+
Observability is a fundamental capability for the AgentCore Gateway because it provides comprehensive real-time insights into the functioning and performance of AI agents deployed through the gateway. By capturing and displaying key metrics such as request volumes, success rates, error patterns, latency for tool invocations, and authentication events, the observability features allow developers and operators to monitor the health and efficiency of their agent workflows continuously. This level of monitoring helps quickly identify anomalies or bottlenecks that could affect user experience or system reliability, enabling proactive troubleshooting and performance tuning.
7+
8+
Beyond high-level metrics, AgentCore Gateway observability offers detailed tracing of each agent’s workflow. Every action—from invoking tools to model calls and memory retrieval—is logged as spans and traces compliant with OpenTelemetry standards. This rich telemetry data provides developers with a transparent view into the internal decision-making processes of agents, including how each step was executed and its duration. Such granular traceability is invaluable for debugging complex failures or unexpected behaviors, as it allows engineers to drill down into the exact point of error or inefficiency. Additionally, by integrating with widely used monitoring platforms like Amazon CloudWatch, these observability features enable a unified and accessible operational overview.
9+
10+
Furthermore, observability supports compliance and governance requirements by offering audit trails of agent activity, which is critical for enterprise environments. It also facilitates optimization by revealing usage patterns and helping adjust agent workflows to reduce costs or improve speed. Ultimately, these observability capabilities transform the AgentCore Gateway from a black-box interface into a transparent, manageable system that supports reliable, scalable, and performant AI agent deployment in production environments.
11+
12+
## Observability with Amazon CloudWatch and AWS CloudTrail
13+
14+
* Amazon CloudWatch focuses on real-time performance monitoring and operational troubleshooting for AgentCore Gateway, providing detailed metrics and logs for latency, error rates, and usage patterns.
15+
* AWS CloudTrail focuses on security, compliance, and auditing by recording a full history of API calls and user actions related to the gateway.
16+
17+
Together, they offer a holistic observability and governance framework for managing AgentCore Gateway in production.
18+
19+
![images/1-agentcore-gw-architecture.png]
20+
21+
#### AgentCore Gateway CloudWatch Metrics
22+
23+
Gateway publishes the following metrics to Amazon CloudWatch. They provide information about about API invocations, performance, and errors.
24+
25+
* **Invocations:** The total number of requests made to each Data Plane API. Each API call counts as one invocation regardless of the response status.
26+
27+
* **Throttles:** The number of requests throttled (status code 429) by the service.
28+
29+
* **SystemErrors:** The number of requests which failed with 5xx status code.
30+
31+
* **UserErrors:** The number of requests which failed with 4xx status code except 429.
32+
33+
* **Latency:** The time elapsed between when the service receives the request and when it begins sending the first response token. In other words, initial response time.
34+
35+
* **Duration:** The total time elapsed between receiving the request and sending the final response token. Represents complete end-to-end processing time of the request.
36+
37+
* **TargetExecutionTime:** The total time taken to execute the target over Lambda / OpenAPI / etc. This helps determine the contribution of the target to the total Latency.
38+
39+
* **TargetType:** The total number of requests served by each type of target (MCP, Lambda, OpenAPI).
40+
41+
#### AgentCore Gateway Cloudwatch Vended Logs
42+
43+
AgentCore logs the following information for gateway resources:
44+
45+
* Start and completion of gateway requests processing
46+
* Error messages for Target configurations
47+
* MCP Requests with missing or incorrect authorization headers
48+
* MCP Requests with incorrect request parameters (tools, method)
49+
50+
AgentCore can output logs to Amazon CloudWatch, Amazon S3, or Firehose stream. This tutorial focuses on CloudWatch.
51+
52+
If you add Amazon CloudWatch Logs under AgentCore Gateway Log Delivery in the AWS console, these logs are stored under the default log group **/aws/vendedlogs/bedrock-agentcore/gateway/APPLICATION_LOGS/{gateway_id}**. You can also configure your custom log group starting with /**aws/vendedlogs/**.
53+
54+
#### AgentCore Gateway CloudWatch Tracing
55+
56+
Enabling tracing on the Amazon Bedrock AgentCore gateway provides deep insights into the behavior and performance of your AI agents and the tools they interact with. It captures the full execution path of a request as it moves through the gateway, which is essential for effective debugging, optimization, and auditing of complex agentic workflow.
57+
58+
* **Traces - Top Level Container**
59+
60+
* Represents the complete interaction context
61+
* Captures the full execution path starting from an agent invocation
62+
* May include multiple agent calls throughout the interaction
63+
* Provides the broadest view of the entire workflow
64+
65+
* **Requests - Individual Agent Invocations**
66+
67+
* Represents a single request-response cycle within a trace
68+
* Each agent invocation creates a new request
69+
* Captures one complete call to an agent and its response
70+
* Multiple requests can exist within a single trace
71+
72+
* **Spans - Discrete Units of Work**
73+
74+
* Represents specific, measurable operations within a request
75+
* Captures fine-grained steps like:
76+
* Component initialization
77+
* Tool executions
78+
* API calls
79+
* Processing steps
80+
* Has precise start/end timestamps for duration analysis
81+
82+
The relationship between these three observability components can be visualized as:
83+
84+
Traces (highest level) - Represent complete user conversations or interaction contexts
85+
86+
Requests (middle level) - Represent individual request-response cycles within a Trace
87+
88+
Spans (lowest level) - Represent specific operations or steps within Request
89+
90+
Trace 1
91+
├── Request 1.1
92+
│ ├── Span 1.1.1
93+
│ ├── Span 1.1.2
94+
│ └── Span 1.1.3
95+
├── Request 1.2
96+
│ ├── Span 1.2.1
97+
│ ├── Span 1.2.2
98+
│ └── Span 1.2.3
99+
└── Request 1.N
100+
101+
Trace 2
102+
├── Request 2.1
103+
│ ├── Span 2.1.1
104+
│ ├── Span 2.1.2
105+
│ └── Span 2.1.3
106+
├── Request 2.2
107+
│ ├── Span 2.2.1
108+
│ ├── Span 2.2.2
109+
│ └── Span 2.2.3
110+
└── Request 2.N
111+
112+
113+
114+
#### AgentCore Gateway CloudTrail
115+
116+
AgentCore Gateway is fully integrated with AWS CloudTrail, which provides comprehensive logging and monitoring capabilities for **tracking API activity** and operational events within your gateway infrastructure.
117+
118+
CloudTrail captures two distinct types of events for AgentCore Gateway
119+
* Management events are logged automatically and capture control plane operations such as creating, updating, or deleting gateway resources
120+
* Data events, which provide information about resource operations performed on or within a gateway (also known as data plane operations), are high-volume activities that must be explicitly enabled as they are not logged by default
121+
122+
CloudTrail captures all API calls for Gateway as events, including calls from the Gateway console and code calls to the Gateway APIs. Using the information collected by CloudTrail, you can determine the request that was made to Gateway, who made the request, when it was made, and additional details [3]. Management events provide information about management operations performed on resources in your AWS account, also known as control plane operations.
123+
124+
## Tutorials Overview
125+
126+
In these tutorials we will cover observability of AgentCore Gateway.
127+
128+
129+
| Information | Details |
130+
|:---------------------|:----------------------------------------------------------|
131+
| Tutorial type | Interactive |
132+
| AgentCore components | AgentCore Gateway, Amazon CloudWatch, AWS CloudTrail |
133+
| Agentic Framework | Strands Agents |
134+
| Gateway Target type | AWS Lambda |
135+
| Inbound Auth IdP | Amazon Cognito |
136+
| Outbound Auth | AWS IAM |
137+
| LLM model | Anthropic Claude Sonnet 4.0 |
138+
| Tutorial components | AgentCore Gateway Observability with CloudWatch,CloudTrail|
139+
| Tutorial vertical | Cross-vertical |
140+
| Example complexity | Easy |
141+
| SDK used | boto3 |
142+
143+
#### Tutorial Details
144+
145+
* In this tutorial, we will create Bedrock AgentCore Gateway and add lambda as the target type with two tools: get_order and update_order.
146+
* We will create the log delivery group with destination as CloudWatch and observe the vended logs.
147+
* We will enable Amazon CloudWatch Tracing and connect the trace ID found in vended logs with the Traces / Spans to dive deeper
148+
* We will create AgentCore Runtime with Strands Agent and walk through the Spans.
149+
* We will configure CloudTrail Management and Data Events and check some examples
150+
151+
### Resources
152+
153+
* [AgentCore generated gateway observability data](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-gateway-metrics.html)
154+
* [Enable log destinations and tracing for AgentCore gateway](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-configure.html#observability-configure-cloudwatch)
155+
* [Logging AgentCore Gateway API calls with CloudTrail](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-cloudtrail.html)
156+
* [Setting up AgentCore CloudWatch Metrics and Alarms](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-advanced-observability-metrics.html)
157+
* [Logging Gateway API calls with CloudTrail](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-cloudtrail.html)
158+
* [Observability Concepts](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-telemetry.html)
159+
535 KB
Loading
210 KB
Loading
307 KB
Loading
170 KB
Loading
397 KB
Loading
752 KB
Loading
239 KB
Loading
228 KB
Loading

0 commit comments

Comments
 (0)