-
Notifications
You must be signed in to change notification settings - Fork 526
Amazon Bedrock Agentcore Add alerting templates #16705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
99669eb
50c44c4
c84c2b7
d90f95c
f64857d
5a4ca24
a7abcd1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-browser-errors", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Browser errors", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Browser"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when errors occur during AWS Bedrock AgentCore Browser operations.\n//\n// Browser tool enables agents to interact with web pages programmatically,\n// automating web-based tasks. Operations include:\n// - StartBrowserSession: Initiates a new browser automation session\n// - StopBrowserSession: Terminates a browser session\n// - ConnectBrowserAutomationStream: Establishes automation stream connection\n// - ConnectBrowserLiveViewStream: Establishes live view stream\n// - GetBrowserSession/ListBrowserSessions: Session management operations\n//\n// This alert monitors both user errors (4xx) and system errors (5xx).\n//\n// The alert is grouped by cloud account, region, and resource to pinpoint\n// specific browser sessions experiencing issues.\n//\n// To reduce alert noise, increase the threshold (e.g., `total_errors > 5`).\n// For more details, see: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser.html\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| WHERE aws.dimensions.Operation IN (\"StartBrowserSession\", \"StopBrowserSession\", \"ConnectBrowserAutomationStream\", \"ConnectBrowserLiveViewStream\", \"GetBrowserSession\", \"ListBrowserSessions\", \"UpdateBrowserStream\")\n| STATS total_user_errors = sum(aws.bedrock_agentcore.metrics.UserErrors.sum), total_system_errors = sum(aws.bedrock_agentcore.metrics.SystemErrors.sum) BY cloud.account.id, cloud.region, aws.dimensions.Resource\n| EVAL total_errors = COALESCE(total_user_errors, TO_LONG(0)) + COALESCE(total_system_errors, TO_LONG(0))\n| WHERE total_errors > 0" | ||
|
||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-browser-session-throttles", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Browser session throttles", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Browser"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when throttling occurs during AWS Bedrock AgentCore Browser operations.\n//\n// Browser session throttling indicates that requests are being rate-limited,\n// which can impact agent ability to perform web automation tasks.\n//\n// Common causes of browser throttling:\n// - High volume of browser session requests\n// - Concurrent session limits exceeded\n// - Resource quota constraints\n//\n// The alert is grouped by cloud account, region, and resource to identify\n// which browser resources are being throttled.\n//\n// To reduce alert noise, increase the threshold (e.g., `total_throttles > 10`).\n// For more details, see: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser.html\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| WHERE aws.dimensions.Operation IN (\"StartBrowserSession\", \"StopBrowserSession\", \"ConnectBrowserAutomationStream\", \"ConnectBrowserLiveViewStream\")\n| STATS total_throttles = sum(aws.bedrock_agentcore.metrics.Throttles.sum) BY cloud.account.id, cloud.region, aws.dimensions.Resource\n| WHERE total_throttles > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-code-interpreter-errors", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Code interpreter errors", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Code Interpreter"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when errors occur during AWS Bedrock AgentCore Code Interpreter operations.\n//\n// Code Interpreter enables agents to execute code within secure, isolated sessions.\n// This alert monitors both user errors (client-side, 4xx) and system errors (server-side, 5xx)\n// across all Code Interpreter operations including:\n// - StartCodeInterpreterSession: Initiates a new code execution session\n// - InvokeCodeInterpreter: Executes code within an active session\n// - StopCodeInterpreterSession: Terminates an active session\n// - CodeInterpreterSession: Session lifecycle metrics\n//\n// The alert is grouped by cloud account, region, and resource to pinpoint the\n// specific Code Interpreter experiencing issues.\n//\n// To reduce alert noise, increase the threshold (e.g., `total_errors > 5`) to only\n// alert on sustained error patterns rather than isolated incidents.\n// For more details, see: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/code-interpreter-observability.html\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| WHERE aws.dimensions.Operation IN (\"StartCodeInterpreterSession\", \"InvokeCodeInterpreter\", \"StopCodeInterpreterSession\", \"CodeInterpreterSession\")\n| STATS total_user_errors = sum(aws.bedrock_agentcore.metrics.UserErrors.sum), total_system_errors = sum(aws.bedrock_agentcore.metrics.SystemErrors.sum) BY cloud.account.id, cloud.region, aws.dimensions.Resource\n| EVAL total_errors = COALESCE(total_user_errors, TO_LONG(0)) + COALESCE(total_system_errors, TO_LONG(0))\n| WHERE total_errors > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-code-interpreter-high-latency", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Code interpreter high latency", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Code Interpreter"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when high execution duration is detected during AWS Bedrock AgentCore Code Interpreter operations.\n//\n// Code Interpreter executes code within secure, isolated sessions. High duration\n// can indicate complex computations, resource constraints, or inefficient code.\n//\n// Duration measures the average execution time for code interpreter operations\n// in milliseconds.\n//\n// High latency in code execution can impact:\n// - Agent response times\n// - User experience\n// - Session timeouts\n//\n// The alert is grouped by cloud account, region, and resource.\n//\n// To adjust sensitivity, change the threshold (default: 30000ms = 30 seconds).\n// For more details, see: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/code-interpreter-observability.html\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| WHERE aws.dimensions.Operation IN (\"InvokeCodeInterpreter\", \"CodeInterpreterSession\")\n| STATS avg_duration_ms = avg(aws.bedrock_agentcore.metrics.Duration.avg) BY cloud.account.id, cloud.region, aws.dimensions.Resource\n| WHERE avg_duration_ms > 30000" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-code-interpreter-throttles", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Code interpreter throttles", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Code Interpreter"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when throttling occurs during AWS Bedrock AgentCore Code Interpreter operations.\n//\n// Code Interpreter enables agents to execute code within secure, isolated sessions.\n// Throttling indicates that code execution requests are being rate-limited.\n//\n// Operations monitored:\n// - StartCodeInterpreterSession: Session creation throttles\n// - InvokeCodeInterpreter: Code execution throttles\n// - StopCodeInterpreterSession: Session termination throttles\n//\n// Common causes of throttling:\n// - High volume of code execution requests\n// - Concurrent session limits exceeded\n// - Compute resource constraints\n//\n// The alert is grouped by cloud account, region, and resource.\n//\n// To reduce alert noise, increase the threshold (e.g., `total_throttles > 10`).\n// For more details, see: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/code-interpreter-observability.html\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| WHERE aws.dimensions.Operation IN (\"StartCodeInterpreterSession\", \"InvokeCodeInterpreter\", \"StopCodeInterpreterSession\")\n| STATS total_throttles = sum(aws.bedrock_agentcore.metrics.Throttles.sum) BY cloud.account.id, cloud.region, aws.dimensions.Resource\n| WHERE total_throttles > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-gateway-errors", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Gateway errors", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Gateway"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when errors occur during AWS Bedrock AgentCore Gateway invocations.\n//\n// Gateway provides a unified API endpoint that enables agents to securely connect\n// to enterprise tools and external resources. It acts as a proxy that handles\n// authentication, authorization, and routing of requests.\n//\n// This alert monitors both:\n// - User errors (4xx): Client-side errors like invalid requests, unauthorized access\n// - System errors (5xx): Server-side errors indicating infrastructure issues\n//\n// The alert is grouped by cloud account, region, and resource to pinpoint the\n// specific gateway experiencing issues.\n//\n// To reduce alert noise, increase the threshold (e.g., `total_errors > 5`) to only\n// alert on sustained error patterns.\n// For more details, see: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-gateway-metrics.html\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| WHERE aws.dimensions.Operation == \"InvokeGateway\"\n| STATS total_user_errors = sum(aws.bedrock_agentcore.metrics.UserErrors.sum), total_system_errors = sum(aws.bedrock_agentcore.metrics.SystemErrors.sum) BY cloud.account.id, cloud.region, aws.dimensions.Resource\n| EVAL total_errors = COALESCE(total_user_errors, TO_LONG(0)) + COALESCE(total_system_errors, TO_LONG(0))\n| WHERE total_errors > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-gateway-high-latency", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Gateway high latency", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Gateway"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when high latency is detected during AWS Bedrock AgentCore Gateway invocations.\n//\n// Gateway serves as a unified API endpoint that enables agents to securely connect to\n// enterprise tools and resources. High latency can indicate network issues, slow\n// downstream services, or resource constraints.\n//\n// Latency measures the average time elapsed between receiving the gateway request\n// and returning the response, measured in milliseconds.\n//\n// The alert is grouped by cloud account, region, and resource to pinpoint the\n// specific gateway experiencing high latency.\n//\n// To adjust sensitivity, change the threshold in the WHERE clause (default: 5000ms = 5 seconds).\n// For more details, see: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-gateway-metrics.html\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| WHERE aws.dimensions.Operation == \"InvokeGateway\"\n| STATS avg_latency_ms = avg(aws.bedrock_agentcore.metrics.Latency.avg) BY cloud.account.id, cloud.region, aws.dimensions.Resource\n| WHERE avg_latency_ms > 5000" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-gateway-throttles", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Gateway throttles", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Gateway"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when throttling occurs during AWS Bedrock AgentCore Gateway invocations.\n//\n// Gateway throttling indicates that requests are being rate-limited due to exceeding\n// allowed TPS (transactions per second) or quota limits. This can impact agent\n// performance and user experience.\n//\n// Common causes of throttling:\n// - High request volume exceeding service quotas\n// - Burst traffic patterns\n// - Insufficient provisioned capacity\n//\n// The alert is grouped by cloud account, region, and resource to identify\n// which gateways are being throttled.\n//\n// To reduce alert noise, increase the threshold (e.g., `total_throttles > 10`).\n// Consider requesting a service quota increase if sustained throttling occurs.\n// For more details, see: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-gateway-metrics.html\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| WHERE aws.dimensions.Operation == \"InvokeGateway\"\n| STATS total_throttles = sum(aws.bedrock_agentcore.metrics.Throttles.sum) BY cloud.account.id, cloud.region, aws.dimensions.Resource\n| WHERE total_throttles > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| { | ||
| "id": "aws-bedrock-agentcore-identity-throttles", | ||
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "[AWS Bedrock AgentCore] Identity throttles", | ||
| "tags": ["AWS", "Amazon Bedrock AgentCore", "Identity"], | ||
| "ruleTypeId": ".es-query", | ||
| "schedule": { | ||
| "interval": "5m" | ||
| }, | ||
| "params": { | ||
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "esqlQuery": { | ||
| "esql": "// Alert triggers when throttling occurs during AWS Bedrock AgentCore Identity operations.\n//\n// Identity service handles authentication and token management for agents.\n// Throttling in identity operations can indicate:\n// - Workload access token fetch throttles: Rate limiting on workload identity tokens\n// - Resource access token fetch throttles: Rate limiting on OAuth2 token fetches\n// - API key fetch throttles: Rate limiting on API key retrievals\n//\n// Token fetch throttling can prevent agents from accessing protected resources\n// and may cause cascading failures in agent workflows.\n//\n// The alert is grouped by cloud account and region to identify affected environments.\n//\n// To reduce alert noise, increase the threshold (e.g., `total_throttles > 10`).\n\nFROM metrics-aws_bedrock_agentcore.metrics-*\n| STATS workload_throttles = sum(aws.bedrock_agentcore.metrics.WorkloadAccessTokenFetchThrottles.sum), resource_throttles = sum(aws.bedrock_agentcore.metrics.ResourceAccessTokenFetchThrottles.sum), apikey_throttles = sum(aws.bedrock_agentcore.metrics.ApiKeyFetchThrottles.sum) BY cloud.account.id, cloud.region\n| EVAL total_throttles = COALESCE(workload_throttles, TO_LONG(0)) + COALESCE(resource_throttles, TO_LONG(0)) + COALESCE(apikey_throttles, TO_LONG(0))\n| WHERE total_throttles > 0" | ||
| }, | ||
| "groupBy": "row", | ||
| "timeField": "event.ingested" | ||
| }, | ||
| "alertDelay": { | ||
| "active": 1 | ||
| } | ||
| }, | ||
| "managed": true, | ||
| "coreMigrationVersion": "8.8.0", | ||
| "typeMigrationVersion": "10.1.0" | ||
| } | ||
|
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
I think Muthu suggested keeping just the service name in tags e.g.
AWS Bedrock AgentCoreThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. However, I would like to see
Browser,Gatewayare high-level components rather than internal components such asmemory,disk. So, including the names of these high-level components may not be wrong. But, considering the number of alert templates included presently is limited, I am opting out of adding the names of high-level components.