|
| 1 | +AgentCore Gateway Observability Tutorial |
| 2 | +# Configure Observability for AgentCore Gateway with Amazon CloudWatch and AWS CloudTrail |
| 3 | + |
| 4 | +## Overview |
| 5 | + |
| 6 | +Observability is a fundamental capability for the AgentCore Gateway because it provides comprehensive real-time insights into the functioning and performance of AI agents deployed through the gateway. By capturing and displaying key metrics such as request volumes, success rates, error patterns, latency for tool invocations, and authentication events, the observability features allow developers and operators to monitor the health and efficiency of their agent workflows continuously. This level of monitoring helps quickly identify anomalies or bottlenecks that could affect user experience or system reliability, enabling proactive troubleshooting and performance tuning. |
| 7 | + |
| 8 | +Beyond high-level metrics, AgentCore Gateway observability offers detailed tracing of each agent’s workflow. Every action—from invoking tools to model calls and memory retrieval—is logged as spans and traces compliant with OpenTelemetry standards. This rich telemetry data provides developers with a transparent view into the internal decision-making processes of agents, including how each step was executed and its duration. Such granular traceability is invaluable for debugging complex failures or unexpected behaviors, as it allows engineers to drill down into the exact point of error or inefficiency. Additionally, by integrating with widely used monitoring platforms like Amazon CloudWatch, these observability features enable a unified and accessible operational overview. |
| 9 | + |
| 10 | +Furthermore, observability supports compliance and governance requirements by offering audit trails of agent activity, which is critical for enterprise environments. It also facilitates optimization by revealing usage patterns and helping adjust agent workflows to reduce costs or improve speed. Ultimately, these observability capabilities transform the AgentCore Gateway from a black-box interface into a transparent, manageable system that supports reliable, scalable, and performant AI agent deployment in production environments. |
| 11 | + |
| 12 | +## Observability with Amazon CloudWatch and AWS CloudTrail |
| 13 | + |
| 14 | +* Amazon CloudWatch focuses on real-time performance monitoring and operational troubleshooting for AgentCore Gateway, providing detailed metrics and logs for latency, error rates, and usage patterns. |
| 15 | +* AWS CloudTrail focuses on security, compliance, and auditing by recording a full history of API calls and user actions related to the gateway. |
| 16 | + |
| 17 | +Together, they offer a holistic observability and governance framework for managing AgentCore Gateway in production. |
| 18 | + |
| 19 | +![images/1-agentcore-gw-architecture.png] |
| 20 | + |
| 21 | +#### AgentCore Gateway CloudWatch Metrics |
| 22 | + |
| 23 | +Gateway publishes the following metrics to Amazon CloudWatch. They provide information about about API invocations, performance, and errors. |
| 24 | + |
| 25 | +* **Invocations:** The total number of requests made to each Data Plane API. Each API call counts as one invocation regardless of the response status. |
| 26 | + |
| 27 | +* **Throttles:** The number of requests throttled (status code 429) by the service. |
| 28 | + |
| 29 | +* **SystemErrors:** The number of requests which failed with 5xx status code. |
| 30 | + |
| 31 | +* **UserErrors:** The number of requests which failed with 4xx status code except 429. |
| 32 | + |
| 33 | +* **Latency:** The time elapsed between when the service receives the request and when it begins sending the first response token. In other words, initial response time. |
| 34 | + |
| 35 | +* **Duration:** The total time elapsed between receiving the request and sending the final response token. Represents complete end-to-end processing time of the request. |
| 36 | + |
| 37 | +* **TargetExecutionTime:** The total time taken to execute the target over Lambda / OpenAPI / etc. This helps determine the contribution of the target to the total Latency. |
| 38 | + |
| 39 | +* **TargetType:** The total number of requests served by each type of target (MCP, Lambda, OpenAPI). |
| 40 | + |
| 41 | +#### AgentCore Gateway Cloudwatch Vended Logs |
| 42 | + |
| 43 | +AgentCore logs the following information for gateway resources: |
| 44 | + |
| 45 | +* Start and completion of gateway requests processing |
| 46 | +* Error messages for Target configurations |
| 47 | +* MCP Requests with missing or incorrect authorization headers |
| 48 | +* MCP Requests with incorrect request parameters (tools, method) |
| 49 | + |
| 50 | +AgentCore can output logs to Amazon CloudWatch, Amazon S3, or Firehose stream. This tutorial focuses on CloudWatch. |
| 51 | + |
| 52 | +If you add Amazon CloudWatch Logs under AgentCore Gateway Log Delivery in the AWS console, these logs are stored under the default log group **/aws/vendedlogs/bedrock-agentcore/gateway/APPLICATION_LOGS/{gateway_id}**. You can also configure your custom log group starting with /**aws/vendedlogs/**. |
| 53 | + |
| 54 | +#### AgentCore Gateway CloudWatch Tracing |
| 55 | + |
| 56 | +Enabling tracing on the Amazon Bedrock AgentCore gateway provides deep insights into the behavior and performance of your AI agents and the tools they interact with. It captures the full execution path of a request as it moves through the gateway, which is essential for effective debugging, optimization, and auditing of complex agentic workflow. |
| 57 | + |
| 58 | +* **Traces - Top Level Container** |
| 59 | + |
| 60 | + * Represents the complete interaction context |
| 61 | + * Captures the full execution path starting from an agent invocation |
| 62 | + * May include multiple agent calls throughout the interaction |
| 63 | + * Provides the broadest view of the entire workflow |
| 64 | + |
| 65 | +* **Requests - Individual Agent Invocations** |
| 66 | + |
| 67 | + * Represents a single request-response cycle within a trace |
| 68 | + * Each agent invocation creates a new request |
| 69 | + * Captures one complete call to an agent and its response |
| 70 | + * Multiple requests can exist within a single trace |
| 71 | + |
| 72 | +* **Spans - Discrete Units of Work** |
| 73 | + |
| 74 | + * Represents specific, measurable operations within a request |
| 75 | + * Captures fine-grained steps like: |
| 76 | + * Component initialization |
| 77 | + * Tool executions |
| 78 | + * API calls |
| 79 | + * Processing steps |
| 80 | + * Has precise start/end timestamps for duration analysis |
| 81 | + |
| 82 | +The relationship between these three observability components can be visualized as: |
| 83 | + |
| 84 | + Traces (highest level) - Represent complete user conversations or interaction contexts |
| 85 | + |
| 86 | + Requests (middle level) - Represent individual request-response cycles within a Trace |
| 87 | + |
| 88 | + Spans (lowest level) - Represent specific operations or steps within Request |
| 89 | + |
| 90 | + Trace 1 |
| 91 | + ├── Request 1.1 |
| 92 | + │ ├── Span 1.1.1 |
| 93 | + │ ├── Span 1.1.2 |
| 94 | + │ └── Span 1.1.3 |
| 95 | + ├── Request 1.2 |
| 96 | + │ ├── Span 1.2.1 |
| 97 | + │ ├── Span 1.2.2 |
| 98 | + │ └── Span 1.2.3 |
| 99 | + └── Request 1.N |
| 100 | + |
| 101 | + Trace 2 |
| 102 | + ├── Request 2.1 |
| 103 | + │ ├── Span 2.1.1 |
| 104 | + │ ├── Span 2.1.2 |
| 105 | + │ └── Span 2.1.3 |
| 106 | + ├── Request 2.2 |
| 107 | + │ ├── Span 2.2.1 |
| 108 | + │ ├── Span 2.2.2 |
| 109 | + │ └── Span 2.2.3 |
| 110 | + └── Request 2.N |
| 111 | + |
| 112 | + |
| 113 | + |
| 114 | +#### AgentCore Gateway CloudTrail |
| 115 | + |
| 116 | +AgentCore Gateway is fully integrated with AWS CloudTrail, which provides comprehensive logging and monitoring capabilities for **tracking API activity** and operational events within your gateway infrastructure. |
| 117 | + |
| 118 | +CloudTrail captures two distinct types of events for AgentCore Gateway |
| 119 | +* Management events are logged automatically and capture control plane operations such as creating, updating, or deleting gateway resources |
| 120 | +* Data events, which provide information about resource operations performed on or within a gateway (also known as data plane operations), are high-volume activities that must be explicitly enabled as they are not logged by default |
| 121 | + |
| 122 | +CloudTrail captures all API calls for Gateway as events, including calls from the Gateway console and code calls to the Gateway APIs. Using the information collected by CloudTrail, you can determine the request that was made to Gateway, who made the request, when it was made, and additional details [3]. Management events provide information about management operations performed on resources in your AWS account, also known as control plane operations. |
| 123 | + |
| 124 | +## Tutorials Overview |
| 125 | + |
| 126 | +In these tutorials we will cover observability of AgentCore Gateway. |
| 127 | + |
| 128 | + |
| 129 | +| Information | Details | |
| 130 | +|:---------------------|:----------------------------------------------------------| |
| 131 | +| Tutorial type | Interactive | |
| 132 | +| AgentCore components | AgentCore Gateway, Amazon CloudWatch, AWS CloudTrail | |
| 133 | +| Agentic Framework | Strands Agents | |
| 134 | +| Gateway Target type | AWS Lambda | |
| 135 | +| Inbound Auth IdP | Amazon Cognito | |
| 136 | +| Outbound Auth | AWS IAM | |
| 137 | +| LLM model | Anthropic Claude Sonnet 4.0 | |
| 138 | +| Tutorial components | AgentCore Gateway Observability with CloudWatch,CloudTrail| |
| 139 | +| Tutorial vertical | Cross-vertical | |
| 140 | +| Example complexity | Easy | |
| 141 | +| SDK used | boto3 | |
| 142 | + |
| 143 | +#### Tutorial Details |
| 144 | + |
| 145 | +* In this tutorial, we will create Bedrock AgentCore Gateway and add lambda as the target type with two tools: get_order and update_order. |
| 146 | +* We will create the log delivery group with destination as CloudWatch and observe the vended logs. |
| 147 | +* We will enable Amazon CloudWatch Tracing and connect the trace ID found in vended logs with the Traces / Spans to dive deeper |
| 148 | +* We will create AgentCore Runtime with Strands Agent and walk through the Spans. |
| 149 | +* We will configure CloudTrail Management and Data Events and check some examples |
| 150 | + |
| 151 | +### Resources |
| 152 | + |
| 153 | +* [AgentCore generated gateway observability data](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-gateway-metrics.html) |
| 154 | +* [Enable log destinations and tracing for AgentCore gateway](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-configure.html#observability-configure-cloudwatch) |
| 155 | +* [Logging AgentCore Gateway API calls with CloudTrail](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-cloudtrail.html) |
| 156 | +* [Setting up AgentCore CloudWatch Metrics and Alarms](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-advanced-observability-metrics.html) |
| 157 | +* [Logging Gateway API calls with CloudTrail](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-cloudtrail.html) |
| 158 | +* [Observability Concepts](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-telemetry.html) |
| 159 | + |
0 commit comments