This sample architecture illustrates two alternative semantic layer approaches for structured data:
- Virtual Knowledge Graph (VKG): OWL ontology stored in Amazon Neptune; natural language queries are translated to SQL via ontology mappings
- Semantic RAG: Semantic metadata stored in Amazon Bedrock Knowledge Base; natural language queries use RAG for context-aware SQL generation
Both approaches share the same admin workflow up to the point of choosing the semantic layer type.
📊 Background / concepts: For the why behind this project — the two failure modes of agentic text-to-SQL (grounding vs. delivery), how a semantic layer + ontology and progressive disclosure address them, and the tiered query path — see the companion presentation
assets/guides/semantic-layer.pptx.
- Dual Semantic Layer Modes: Choose VKG (Neptune-based ontology) or Semantic RAG (Bedrock KB-based metadata)
- AI-Powered Metadata Enrichment: Automated descriptions for databases, tables, and columns written back to Glue Data Catalog and S3 Tables metadata
- AI-Assisted Metadata Generation: Versioned, iterative semantic layer creation with human in the loop
- Natural Language Queries: Query data across sources using plain English
- Multi-Turn Conversational Chat: Streaming AG-UI chat with persisted session history, per-turn reasoning trace, and follow-up context (DynamoDB-backed transcripts)
- MCP Server: An AgentCore Gateway exposes the query agents as MCP tools (
ListOntologies,OntologyQuery,MetadataQuery,QuerySuggestions) to Claude Code, Cowork, Cursor, VS Code, and other MCP clients over OAuth 2.0 — seeassets/guides/MCP_SERVER.md - User Feedback: 👍/👎 ratings + comments per assistant turn, PII-redacted by Guardrails and stored in DynamoDB; surfaced in the admin Feedback tab
- Lessons Learned / Long-Term Memory: Bedrock AgentCore Memory mines durable lessons from chat sessions (SemanticStrategy) and injects them as prior context into future queries; surfaced in the admin Lessons Learned tab
- Ground-Truth Evaluations: Per-layer ground-truth datasets drive AgentCore on-demand + online evaluation runs (accuracy / latency / token metrics) surfaced in the admin Evaluations tab
- Production Monitoring: Read-only admin Monitoring tab reporting how live queries resolved per layer — bucketed by each answer's persisted
provenance.tierinto metric / semantic / advisory / agentic(-not-implemented) resolution layers — plus a correction-language rate (share of user turns that read as a correction) correlated with the count of lessons AgentCore Memory has extracted - Adversarial Red-Teaming: Automated guardrail red-team suite (Strands Evals,
CrescendoStrategy) probing the query agents across 5 OWASP-aligned risk categories; run manually today — seetests/eval/RED_TEAM_IMPLEMENTATION.md - Two-Tier Query Path: A Tier 1 governed-metric lookup first matches the question (Titan-v2 embedding + KNN, cosine ≥ 0.85) against maintained/published metrics and, on a clear match, runs that metric's pre-validated SQL on Athena — returning early. Otherwise it falls through to a Tier 2 deterministic Strands graph (topic router → disambiguation → slice builder → SQL/SPARQL generate+validate → grounding gate + bounded execution) with clarification loops
- Maintained Metrics (governed): curated metrics (name/description/synonyms + pre-validated SQL) stored in DynamoDB with a DRAFT → APPROVED → PUBLISHED lifecycle; published metrics are embedded for the Tier 1 lookup and authored via the
/metricsAPI (gated byMETRICS_TABLE) - Ontology in Neptune: Business concepts, relationships, and mappings stored as RDF/OWL
- Data in Source Systems: Actual data remains in source systems
- Query Translation: AI agents use semantic layer to translate natural language into SQL/SPARQL queries (VKG: SPARQL→SQL via Ontop reformulation on Athena)
- Framework: React 18
- UI Library: AWS Cloudscape Design System 3.0
- Authentication: AWS Amplify Auth (Cognito)
- HTTP Client: Axios (
^1.16.0) - Routing: React Router v6
- Streaming Chat: AG-UI Server-Sent Events (SSE) over fetch + ReadableStream, streamed directly through the Chat AgentCore Gateway (JWT)
- Hosting: CloudFront + S3
- Infrastructure as Code: AWS CDK v2 (TypeScript)
- API Layer: AWS Lambda with Python FastAPI
- AI Framework: Strands Agents SDK
- LLM: Amazon Bedrock
- Container Registry: Amazon ECR
- Agent Runtime: Amazon Bedrock AgentCore Runtime (5 runtimes)
- Agent Gateways: Amazon Bedrock AgentCore Gateway — MCP server (tools), MCP OAuth proxy, streaming chat, and Neptune/Ontop SPARQL→SQL
- Long-Term Memory: Amazon Bedrock AgentCore Memory (SemanticStrategy) for lessons-learned mining
- Agent Evaluation: Amazon Bedrock AgentCore Evaluation (on-demand batch + online sampling) against per-layer ground-truth datasets
- SPARQL→SQL Translation: Ontop reformulation (Java 21 Lambda on the Neptune Gateway)
- Graph Database: Amazon Neptune (RDF/SPARQL) — VKG mode
- Vector Store: Amazon Bedrock Knowledge Base (S3 Vectors) — ontology-patterns (VKG) + semantic-rag (Semantic RAG)
- Operational Data: Amazon DynamoDB (12 insurance tables)
- Application State: Amazon DynamoDB —
semantic-layer-metadata(ontology/job state),…-chat-sessions(multi-turn transcripts, TTL),…-feedback(👍/👎 + comments),…-metrics(governed metrics + embeddings, Tier 1) tables - Analytical Data: Amazon S3 Tables (Apache Iceberg format)
- Real-Time CDC: DynamoDB Streams → Lambda (PyIceberg) → S3 Tables
- Batch Replication: AWS Glue Zero-ETL integration
- Normalized Views: AWS Glue 5.1 Materialized Views (40 Iceberg MVs in
normalizednamespace) - Metadata Catalog: AWS Glue Data Catalog (AI-enriched descriptions)
- Query Engine: Amazon Athena (with DynamoDB Connector + S3 Tables catalog)
- Authentication: Amazon Cognito (OAuth 2.0 + PKCE); CUSTOM_JWT authorizers on the MCP/chat gateways; M2M
client_credentialsfor backend service-to-runtime calls - AI Safety: Amazon Bedrock Guardrails (INPUT/OUTPUT screening on queries; PII redaction on feedback + memory writes); adversarial red-team suite validating the guardrails under attack — see
tests/eval/RED_TEAM_IMPLEMENTATION.md - Observability: AWS OpenTelemetry Distro; AgentCore online + on-demand evaluation; CloudWatch custom chat metrics
- Secrets: AWS Secrets Manager + Systems Manager Parameter Store
- Network: Lake Formation permissions for S3 Tables access control
Implementation: frontend/src/pages/admin/DescribeIntent.jsx
- Text input for data source descriptions and business use cases
- Stores configuration in DynamoDB
semantic-layer-metadatatable
Implementation: frontend/src/pages/admin/SelectDataSources.jsx
- Multi-select from Glue Catalog databases/tables (including S3 Tables/Iceberg)
- Optional file upload for existing ontology/documentation
- Each selected table includes its Athena
catalogIdfor federated routing
Implementation: frontend/src/pages/admin/ReviewMetadata.jsx
- Read-only view of Glue Catalog metadata (tables, columns, data types)
Implementation: frontend/src/pages/admin/SelectSemanticLayerType.jsx
- VKG: Generates OWL ontology stored in Amazon Neptune and Amazon S3
- Semantic RAG: Generates AI metadata stored in Amazon Bedrock Knowledge Base and Amazon S3
Implementation: frontend/src/pages/admin/BuildKnowledgeGraph.jsx
- Triggers Ontology Agent via Lambda API
- Progress: extracting metadata → retrieving patterns → generating OWL → loading to Neptune
Implementation: frontend/src/pages/admin/BuildSemanticMetadata.jsx
- Triggers Metadata Agent via Lambda API
- Progress polling: per-table status with
tablesProcessed / totalTables - Agent writes descriptions to Glue Catalog and saves Markdown docs to S3/KB
The admin detail screen (ViewKnowledgeGraph.jsx for VKG, ViewSemanticRAGMetadata.jsx for
Semantic RAG) hosts tabbed management surfaces:
- Metadata / Knowledge Graph: ontology editor (VKG graph visualization ^1) or enriched metadata view
- Data Sources: the tables backing the layer
- Feedback (
frontend/src/components/FeedbackTab.jsx): per-turn 👍/👎 ratings + comments collected from chat, PII-redacted by Guardrails - Lessons Learned (
frontend/src/components/LessonsLearnedTab.jsx): long-term lessons mined from chat sessions by AgentCore Memory - Ground Truth (
frontend/src/pages/admin/GroundTruthDataset.jsx): upload/inspect the per-layer evaluation dataset - Evaluations (
frontend/src/pages/admin/Evaluations.jsx): AgentCore evaluation runs (accuracy / latency / token metrics) - Monitoring (
frontend/src/components/MonitoringTab.jsx): live query-resolution breakdown (metric / semantic / advisory / agentic) bucketed by each answer's persistedprovenance.tier, plus a correction-language rate correlated with extracted lessons - Supplementary Docs (
frontend/src/pages/admin/UploadSupplementaryDocs.jsx): upload extra reference docs into the doc-pipeline → KB
^1 Optional: If "enableOntologyAgents": false, then this screen is disabled and SemanticRAG is selected as default
Implementation: frontend/src/pages/query/ — AskQuestion.jsx (route), LandingPage.jsx
(layer picker + empty-state composer), ChatView.jsx, ChatTranscript.jsx, Composer.jsx,
ReasoningPanel.jsx, ResultPanel.jsx, FeedbackBar.jsx; state machine in
frontend/src/hooks/useChatStream.js, sidebar history in useChatSessions.js.
- Streaming, multi-turn chat: AG-UI events stream over SSE directly through the Chat
AgentCore Gateway (JWT). Transcripts persist in the
chat-sessionsDynamoDB table (TTL 24h), so refreshing or clicking a past session in the sidebar rehydrates the conversation. - Routes to the correct query agent based on layer type:
- VKG: Ontology Query Agent — Neptune topic routing → SPARQL → Ontop SPARQL→SQL → Athena
- Semantic RAG: Metadata Query Agent — KB slice retrieval → SQL → Athena
- Per-turn reasoning trace: the tiered workflow emits phase events (router, disambiguation,
slice builder, SQL/SPARQL generate+validate, grounding/execution) rendered in
ReasoningPanel, including the executed SQL, result table (CSV download), and the Semantic-RAG slice (JSON download). - Per-turn feedback: 👍/👎 with an optional comment writes to
POST /query/feedback. - Dynamic Suggested Questions: selecting a layer calls
GET /query/suggestions/{id}for 3 AI-generated questions from the Query Suggestions Agent, shown with category labels.
🧪 Sample structure, synthetic data. This is a reference data model, not a real dataset. The 12 operational tables follow the ACORD insurance data standard (an industry reference schema) and are populated entirely with synthetic, machine-generated data — no real customer, policy, or financial records. The data is produced by
scripts/generate_complete_synthetic_data.pyand loaded viascripts/load_to_dynamodb.pywhenenableAcordSampleDatais set — on by default in the committedcdk.json; deploy with-c enableAcordSampleData=falseto skip the synthetic rows (see Deployment Modes). Adapt the schema and loaders to your own structured sources to reuse the semantic-layer architecture.
HOLDING, PARTY, COVERAGE, RIDER, RELATION, FINANCIALACTIVITY, FINANCIALSTATEMENT, POLICYPRODUCT, COVERAGEPRODUCT, INVESTPRODUCT, TYPE_CODES, ADMIN_CODES
Access: Athena with DynamoDB Connector (lambda: catalog)
Real-time CDC pipeline ^2
DynamoDB Streams → Lambda (PyIceberg) → S3 Tables (Iceberg)
↑
Glue Zero-ETL (batch) ─┘
- Sub-second latency via DynamoDB Streams + Lambda
- True schema evolution via Iceberg spec
- UPSERT/DELETE support via PyIceberg atomic operations
- Registered as
s3tablescatalogcatalog in Athena - Governed via Lake Formation
^2 Optional: Enabled if "enableRealtimeReplication": true
Batch pipeline (alternative to the real-time CDC pipeline) ^3
DynamoDB table → Zero-ETL integration → S3 Tables (zetl_<uuid> namespace)
↓
Glue 5.1 Materialized Views
↓
S3 Tables (normalized namespace)
└─ 40 entity tables (holding, party,
coverage, rider, relation, ...)
- Glue Zero-ETL integration per DynamoDB table → S3 Tables (
zetl_<uuid>namespace per integration) - NormalizedViewsStack: Glue 5.1 PySpark job creates 40 Iceberg Materialized Views in a
normalizednamespace, applyingsk LIKE '<Prefix>#%'filters to demultiplex each flat Zero-ETL table into its constituent entity tables - Scheduled refresh every 6 hours via EventBridge; incremental Iceberg refresh where possible
zetl_*namespaces are internal replication staging — thenormalizednamespace is the user-facing analytical layer
^3 Optional: Enabled if "enableBatchReplication": true
Location: agents/ontology_agent/main.py
Purpose: Generates OWL ontologies from Glue schemas and table sampling using user-provided documentation and sample ontology patterns retrieved via RAG from Bedrock KB.
Three sub-agents: Phase 1 (per-table N-Quads), Phase 2 (FK refinement + Neptune persist), Revision (targeted edits)
Phase 1 tools (per-table, fresh agent per table):
get_single_table_schema(database_name, table_name, catalog_id) # Athena DESCRIBE + Glue fallback
sample_table_data(database_name, table_name, catalog_id) # Athena SELECT + DynamoDB fallback
retrieve_ontology_patterns(schema_description, max_patterns) # RAG from Bedrock KB
download_document_from_s3(s3_path) # download reference docs
search_document(file_path, search_term, context_lines)
read_document_lines(file_path, start_line, num_lines)
append_nquads(ontology_id, table_name, nquad_batch) # batched N-Quad writing (≤70 lines)
save_intermediate_ontology(ontology_id, table_name, ...) # finalize + S3 save
update_progress(ontology_id, tables_processed, total_tables, current_table)Phase 2 tools (per-table, fresh agent per table):
append_fk_triples(ontology_id, table_name, fk_nquads) # add FK ObjectProperty triples
persist_file_to_neptune(ontology_id, table_name) # read file → AgentCore GW → Neptune
update_glue_metadata_from_ontology(ontology_id, database_name, table_name, catalog_id)Assembly (Python, not agent): concatenate all per-table N-Quads → save consolidated ontology.nq to S3
Post-assembly: write Iceberg column doc strings + table descriptions to S3 Tables metadata via pyiceberg
Revision tools: download_document_from_s3, search_document, read_document_lines, apply_targeted_edits, persist_revision_from_s3
Output: N-QUADS in Neptune named graphs (via AgentCore Gateway) with mapsToTable/mapsToColumn traceability predicates; column descriptions written to Glue Data Catalog and Iceberg S3 metadata.
Location: agents/metadata_agent/main.py
Purpose: Create semantic metadata and save it in Glue Catalog, S3 Table metadata, and as markdown metadata documents to S3 for Bedrock KB ingestion. Supports two operational modes: standard enrichment and annotation-only revision.
Tools (shared by both modes):
get_single_table_schema(database_name, table_name, catalog_id)
sample_table_data(database_name, table_name, catalog_id)
download_document_from_s3(s3_path)
search_document(file_path, search_term, context_lines)
read_document_lines(file_path, start_line, num_lines)
update_glue_table_metadata(database_name, table_name, table_description, column_descriptions, catalog_id)
update_glue_database_description(database_name, description)
save_metadata_document_to_s3(database_name, table_name, catalog_id, metadata_content)
update_progress(job_id, tables_processed, total_tables, current_table)Standard enrichment workflow (per table, fresh agent per table):
- For each unique database: write AI description to Glue
- For each table: DESCRIBE schema → sample data (+ reference docs if uploaded) → generate descriptions → write to Glue & S3 Table Metadata → save Markdown to S3 → update progress
- After all tables: trigger Bedrock KB ingestion job
- Returns immediately (async); status polled via
jobIdin DynamoDB
Annotation mode (ANNOTATION_SYSTEM_PROMPT): Triggered when annotations are included in the enrichment payload. Skips data sampling and reference docs. Reads existing Glue descriptions as baseline, applies targeted per-column/per-table annotation hints, leaves all non-targeted descriptions unchanged, and rewrites the S3 Knowledge Base document.
Versioning (pointer + history pattern): Mirrors the ontology agent. When revisionMode=True:
- Service stamps v1 with
revisionMode,targetVersion(e.g.v2), andrevisionInstructions - Agent runs enrichment/annotation as normal
- On completion:
_write_versioned_completion()writes an immutable history record (SK =v2) then updates v1 as the mutable current pointer (currentVersion = v2,revisionMode = False)
Federated catalog routing: The catalogId per table (e.g. s3tablescatalog/<bucket>) is resolved automatically. For S3 Tables, versionToken is fetched from the S3 Tables API on Glue update conflicts (retry-on-exception pattern).
Location: agents/ontology_query_agent/main.py (+ tier2/ workflow, shared graph in agents/shared/tier2_graph.py)
Tier 1 — governed-metric lookup (agents/shared/metric_lookup.py + metric_executor.py):
before the graph runs, the question is embedded (Titan v2) and KNN-matched against published
maintained metrics for the namespace. On a clear hit (cosine ≥ 0.85) the metric's pre-validated
compiled_sql is executed on Athena and the answer is returned early — short-circuiting Tier 2.
Any miss or error falls through (fail-soft). The response is shaped like a Tier 2 result
(metadata.tier = 1), so the UI is tier-agnostic. This cascade is identical in the
Semantic-RAG agent below.
Tier 2 — Strands-graph workflow (agents/ontology_query_agent/tier2/workflow.py), run only
when Tier 1 finds no match:
Phase 1 Topic router KNN/lexical → candidate class + property IRIs
Phase 2 Term disambiguation term → IRI; >1 class IRI → clarification
Phase 3 Slice builder + judge SPARQL CONSTRUCT (n-hops) → Turtle slice
Phase 3b Slice disambiguation property collision / multi class-path
Phase 4 SPARQL generate + validate (rdflib parseQuery + 1 repair)
Phase 5 Grounding gate → Ontop SPARQL→SQL translate → Athena execute (+ SQL repair)
- Grounding back-edge: an out-of-slice-but-real IRI loops to Phase 3 (expand the slice); a hallucinated/misused IRI loops to Phase 4 (regenerate with feedback).
- Phase 5 execution: the grounded SPARQL is lineage only (Neptune is schema-only); the
translate_sparql_to_sqlOntop Lambda (Java, on the Neptune Gateway) reformulates it to Athena SQL, which the agent executes — with a bounded LLM SQL-repair retry on Athena failures. - Dynamic row limits: default 10, user-specified (e.g. "top 30"), max 100.
Location: agents/metadata_query_agent/main.py (+ tier2/ workflow, shared graph in agents/shared/tier2_graph.py)
Purpose: Answers natural-language questions over the Semantic-RAG layer using the same two-tier cascade as the VKG agent — Tier 1 governed-metric lookup first (see §3), then the Tier 2 Strands graph specialized for KB-backed metadata when no metric matches:
Phase 1 Topic router KB retrieval → candidate tables (+ scores)
Phase 2 Term disambiguation term → table; ambiguity → clarification
Phase 3 Slice builder + judge parse KB markdown chunks → JSON slice
(tables/columns/joins/acord_paths/query_patterns)
Phase 3b Slice disambiguation slice-level collision → clarification
Phase 4 SQL generate + validate (sqlglot parse + 1 repair)
Phase 5 Grounding gate → bounded execution agent → Athena execute
- The Phase 3 slice (the grounding context) is surfaced to the chat UI for view + JSON download.
- Grounding back-edge loops to Phase 4 (a hallucinated column can't be fixed by widening the slice).
Location: agents/query_suggestions_agent/main.py
Purpose: Generates 3 dynamic, contextually relevant suggested questions for the Natural Language Query UI by retrieving schema context from the Bedrock Knowledge Base for the selected semantic layer.
Tools:
retrieve_kb_context(user_query) # retrieves schema docs from Bedrock KB (top 10 results)Workflow:
- Receives
{"id": "<ontology_config_id>"}payload from AgentCore entrypoint - Looks up the metadata config name from DynamoDB
- Agent calls
retrieve_kb_context("list all available tables and their columns and business purpose") - LLM analyses schema context and generates 3 categorised questions
- Returns
{"suggestions": [{"category": "...", "question": "..."}, ...]}
Invocation: Synchronous — no polling required. Called via GET /query/suggestions/{ontology_id}.
Before the Tier 2 Strands graph runs, both query agents try a governed-metric lookup that short-circuits the expensive ontology/KB resolution when a curated metric answers the question.
- What a metric is (
agents/shared/metric_models.py): a PydanticMetricwithmetric_id,namespace,name,description,synonyms, a pre-validatedcompiled_sql(+dialect),supported_dimensions/supported_filters, an optionallinked_class(VKG), alifecycle(DRAFT → APPROVED → PUBLISHED), andversion. - Storage: the
semantic-layer-metricsDynamoDB table (pk = NS#{namespace},sk = METRIC#{metric_id}). On publish,name + description + synonymsare embedded with Bedrock Titan v2 (1024-dim) and stored on the row; DRAFT metrics skip the embedding cost. - Matching (
agents/shared/metric_lookup.py): embed the question, KNN-search an in-memory index (hydrated lazily from DynamoDB, namespace pre-filtered), and accept the top hit only if cosine similarity ≥0.85(configurable). No LLM is involved. Fail-soft — any miss, error, or index drift falls through to Tier 2 (never a 500). - Execution (
agents/shared/metric_executor.py): apply any allowed filters via sqlglot AST (never string concat), run the SELECT-only SQL on Athena, and shape the rows. The response is built to match the Tier 2 payload (metadata.tier = 1) so the frontend is tier-agnostic. - Authoring/maintenance: the
/metricsCRUD router (lambda/rest-api/routers/metrics.py,services/metric_service.py) mounts only whenMETRICS_TABLEis set; SQL is validated SELECT-only with sqlglot at write time, and the embedding is (re)computed on publish.
Guide: assets/guides/MCP_SERVER.md
An AgentCore Gateway (CUSTOM_JWT, Cognito) exposes the query agents as MCP tools —
ListOntologies (discover published layers + their VKG/Semantic-RAG mode, call first),
OntologyQuery, MetadataQuery, QuerySuggestions — to MCP clients (Claude Code, Cursor,
VS Code, MCP Inspector). The Gateway forwards each tools/call to the lambda/mcp-tools
Lambda (INPUT guardrail → invoke runtimes over HTTPS with an M2M OAuth bearer → OUTPUT guardrail).
The lambda/mcp-proxy OAuth proxy (HTTP API + Lambda) runs the MCP OAuth 2.0 flow
(RFC 8414/9728 discovery + Authorization Code + PKCE + Dynamic Client Registration) so clients
log in via the browser. OntologyQuery returns {answer, rows, sparql, sql, executed_sql, lineage}; MetadataQuery returns the retrieved slice instead of lineage.
A single Bedrock AgentCore Memory resource with a SemanticStrategy backs the lessons-learned
feature. The Strands memory_hooks (agents/shared/memory_hooks.py) write each PII-redacted turn
into short-term memory; AgentCore asynchronously extracts durable lessons. On a new query, the
agent injects relevant prior lessons + prior results as context. Admins browse/delete records via
the Lessons Learned tab (GET/DELETE /lessons/{ontology_id}).
Admins upload a per-layer ground-truth dataset (POST /groundtruth/{id}/upload) — JSON records of
Natural_Language_Question / Expected_Answer / Expected_SQL_Query / Expected_SQL_Result.
The agentcore-eval stack configures online evaluation (sampling) on all runtimes; the
Evaluations tab triggers on-demand batch runs and shows per-evaluator scores
(GET /evaluations/{id}, POST /evaluations/{id}). Built-in evaluators include
ToolParameterAccuracy, ToolSelectionAccuracy, and GoalSuccessRate.
A read-only admin Monitoring tab (GET /monitoring/{id} → services/monitoring_service.py)
reports two signals about a layer's LIVE query traffic, scoped per layer:
- Resolution-layer breakdown: every answered turn is bucketed by its persisted
totals.provenance.tierinto metric (Tier 1 governed metric), semantic (Tier 2 graph —semantic_sql+vkgboth fold here), advisory (schema / "what can I ask" answers), and agentic (planned, always 0 — surfaced explicitly rather than hidden). Turns with no recognized tier are skipped so they don't dilute the buckets. - Correction-language rate: the
correction_language.is_correctionheuristic runs over each persisted user turn; the share that read as a correction ("that's the wrong table") is reported alongside the count of long-term lessons AgentCore Memory has extracted (each correction is a candidate lesson), so the operator can see whether corrections are being captured durably.
Data comes from a bounded FilterExpression Scan of the TTL-bounded chat-sessions table (no
GSI on ontologyId, no per-turn version → scoped to the layer, all versions). Read-only — no
agent-runtime change; the REST Lambda already had the needed IAM + catch-all route.
A staged Lambda pipeline (lambda/doc-pipeline: chunk → NER → embed → link → index) ingests
admin-uploaded reference documents into the Bedrock Knowledge Base
(POST /documents/{id}/upload), enriching the retrieval context for both modes.
17 stacks deploy unconditionally; 4 are flag-gated. Three flags gate whole stacks (defaults are
the values committed in cdk/cdk.json — flip a flag and that stack is added or
dropped regardless of the default):
enableRealtimeReplication(defaultfalse) → 1 stack (stream-processor)enableBatchReplication(defaulttrue) → 2 stacks (zeroetl+normalized-views)enableOntologyAgents(defaulttrue) → 1 stack (neptune)
The other deployment flags (enableSemanticRag, enableAcordSampleData, enableOboPassthrough)
do not add or remove stacks — they toggle resources inside always-deployed stacks (e.g.
runtimes within agentcore, the synthetic-data loader within dynamodb). See
Deployment Modes for what every flag does.
1. semantic-layer-networking
└─> VPC, Subnets, Security Groups, VPC Endpoints
2. semantic-layer-dynamodb
└─> 12 insurance tables + metadata/chat-sessions/feedback tables + synthetic data loader
3. semantic-layer-glue-catalog
└─> Depends on: dynamodb
└─> DynamoDB Glue database + crawler; Iceberg Glue database
4. semantic-layer-data-lake
└─> Depends on: glue-catalog
└─> S3 Tables bucket (Iceberg), artifacts bucket, athena results,
KB bucket, logging bucket; Lake Formation grants
5. semantic-layer-stream-processor [enableRealtimeReplication=true]
└─> Depends on: dynamodb, data-lake
└─> DynamoDB Streams → Lambda (PyIceberg) → S3 Tables CDC pipeline
DLQ, per-table stream processors, backfill Lambda
6. semantic-layer-zeroetl [enableBatchReplication=true]
└─> Depends on: dynamodb, data-lake
└─> 12 Glue Zero-ETL integrations: DynamoDB table → S3 Tables
(each integration creates a zetl_<uuid> namespace)
7. semantic-layer-normalized-views [enableBatchReplication=true]
└─> Depends on: zeroetl, data-lake
└─> Glue 5.1 PySpark job: 40 Iceberg Materialized Views in
'normalized' S3 Tables namespace; EventBridge 6h refresh schedule
IAM role + LF grants on all zetl_* namespaces + normalized
8. semantic-layer-neptune [enableOntologyAgents=true]
└─> Depends on: networking
└─> Neptune cluster (RDF/SPARQL) in VPC
9. semantic-layer-bedrock-kb
└─> Depends on: data-lake
└─> Knowledge Base (S3 Vectors); dual use:
ontology patterns (VKG) + enriched metadata (Semantic RAG)
10. semantic-layer-athena
└─> Depends on: data-lake, glue-catalog, dynamodb, networking
└─> Workgroup, DynamoDB connector, Lake Formation admin chain
11. semantic-layer-agentcore-memory
└─> Bedrock AgentCore Memory (single resource, SemanticStrategy) for lessons-learned
12. semantic-layer-guardrails
└─> Bedrock Guardrails (content filters + PII detection)
13. semantic-layer-cloudfront-storage
└─> Depends on: data-lake
└─> CloudFront distribution + S3 website bucket + OAC
14. semantic-layer-auth
└─> Depends on: cloudfront-storage
└─> Cognito User Pool, Identity Pool, OAuth 2.0 (+ M2M client, MCP scope)
15. semantic-layer-agentcore
└─> Depends on: neptune, bedrock-kb, glue-catalog, athena, data-lake, agentcore-memory, auth
└─> 5 AgentCore Runtimes + ECR repo + Neptune Gateway construct
(Ontop SPARQL→SQL translate Lambda); JWT-inbound runtimes
LF SELECT grants on normalized namespace (when enableBatchReplication=true)
16. semantic-layer-agentcore-eval
└─> Depends on: agentcore
└─> Online evaluation configs (sampling) for the AgentCore query runtimes
17. semantic-layer-doc-pipeline
└─> Depends on: data-lake, bedrock-kb
└─> Staged Lambda doc-ingestion pipeline (chunk → NER → embed → link → index)
18. semantic-layer-lambda-api
└─> Depends on: auth, data-lake, dynamodb, agentcore, agentcore-memory, doc-pipeline
└─> FastAPI Lambda + HTTP API Gateway (JWT) + Lake Formation grants
19. semantic-layer-mcp-server
└─> Depends on: agentcore, auth, guardrails
└─> MCP Gateway (CUSTOM_JWT) + mcp-tools Lambda + streaming Chat Gateway
20. semantic-layer-mcp-proxy
└─> Depends on: mcp-server, auth
└─> MCP OAuth 2.0 proxy (HTTP API + Lambda) for Claude Code / Cursor / VS Code
21. semantic-layer-frontend
└─> Depends on: cloudfront-storage, auth, lambda-api, mcp-server
└─> React build + S3 sync + CloudFront invalidation
- 12 insurance domain tables with DynamoDB Streams enabled (NEW_AND_OLD_IMAGES)
semantic-layer-metadatatable for ontology/metadata job tracking…-chat-sessions(multi-turn transcripts, TTL),…-feedback(👍/👎 + comments), and…-metrics(governed metrics + Titan-v2 embeddings for the Tier 1 lookup) tables- Synthetic data loader Lambda
- S3 Tables bucket: Apache Iceberg tables (replaces plain Parquet)
- Artifacts bucket: Ontologies (Turtle), metadata documents (Markdown)
- Athena results bucket: Query result storage with 7-day lifecycle
- Knowledge Base bucket: Source docs for Bedrock KB
- Lake Formation: Grants for stream processor, Athena execution role, agent roles
- Exports
lfGrantSingletonRoleArnto preserve LF admin chain across stacks
insurance_dynamodbdatabase: DynamoDB tables via crawlerinsurance_icebergdatabase: S3 Tables (Iceberg) namespace- Auto-starts DynamoDB crawler on deployment
- Per-table Lambda functions consuming DynamoDB Streams
- Writes to S3 Tables via PyIceberg (ARM64 Docker container)
- SQS Dead Letter Queue with 14-day retention
- Backfill Lambda for initial data load
- CodeBuild (ARM64) for container image build and push
- 12 Glue Zero-ETL integrations — one per DynamoDB source table → S3 Tables
- Each integration creates a
zetl_<uuid>namespace in the S3 Tables bucket - Multiple deployments create multiple UUID namespaces; the NormalizedViewsStack job dynamically discovers the newest per source table at runtime
- Managed batch replication as alternative to the real-time stream processor
- Glue 5.1 PySpark job (
glue/create-normalized-views.py): creates 40 Apache Iceberg Materialized Views in anormalizedS3 Tables namespace from Zero-ETL source tables - Namespace discovery: job discovers the most-recently-created
zetl_*namespace per source table at runtime via ` - Idempotent:
CREATE MATERIALIZED VIEW IF NOT EXISTS+REFRESH MATERIALIZED VIEW— safe to re-run - Spark conf: S3Tables Glue catalog with
client.region(required byLakeFormationAwsClientFactory) andwarehouse(required by catalog plugin initialisation) - EventBridge rule: triggers job every 6 hours; incremental Iceberg refresh where possible
- Lake Formation: SELECT grants on all
zetl_*namespaces (both current and historical);AwsCustomResourcepre-createsnormalizednamespace at deploy time so LFCREATE_TABLEgrant succeeds
- 5 AgentCore Runtimes:
ontology,query(VKG),metadata,metadata-query(Semantic RAG),query-suggestions - Feature flag: VKG-related resources (ontology runtime, query runtime, Neptune Gateway) are conditionally deployed only when both
neptuneStackandbedrockKbStackare provided — allows deploying in Semantic RAG-only mode - AgentCore Neptune Gateway: HTTP gateway construct enabling agents to access Neptune without VPC; also hosts the Ontop SPARQL→SQL translate Lambda (Java 21,
lambda/ontop-translate) used by VKG Phase 5 - JWT-inbound runtimes: the query runtimes accept Cognito JWT (no SigV4), enabling the MCP/chat gateways and the browser to invoke them with bearer tokens
- Shared ECR repository; CodeBuild (ARM64) per agent
- IAM roles with least-privilege per agent type
- Lake Formation permissions for Iceberg table access
- FastAPI container on Lambda; sub-apps mounted at
/ontology,/datasource,/query,/metadata,/neptune,/lessons,/feedback,/documents,/groundtruth,/evaluations,/monitoring(with/querycarrying the chat-session and per-turn feedback sub-routes) - Endpoints (selected):
POST /ontology/config— create/update ontology configGET /ontology/config/{ontology_id}— get configGET /ontology/list— list all configsDELETE /ontology/config/{ontology_id}— delete configPOST /ontology/build/{ontology_id}— trigger Ontology Agent (VKG)GET /ontology/build-status/{ontology_id}— poll build progressGET /ontology/versions/{ontology_id}— list ontology version historyGET /ontology/content/{ontology_id}/{version_id}— retrieve version contentPOST /ontology/revise/{ontology_id}/{version_id}— start versioned ontology revisionPOST /ontology/upload— upload reference documentPOST /metadata/enrich— trigger Metadata Agent (Semantic RAG); accepts optionalannotationsfor annotation-only modeGET /metadata/enrich/status/{job_id}— poll enrichment progressPOST /metadata/revise/{id}/{version_id}— start versioned metadata revision (pointer + history pattern)GET /metadata/table/{database_name}/{table_name}— get AI-enriched KB metadata for a single tableGET /query/suggestions/{ontology_id}— AI-generated suggested questions (synchronous)GET /query/sessions·GET /query/sessions/{id}·DELETE /query/sessions/{id}— chat session list / transcript rehydrate / archivePOST /query/feedback— record a 👍/👎 turn rating (+ comment);GET /feedback/{id}·DELETE /feedback/{id}/{feedbackId}— admin Feedback tabGET /lessons/{id}·DELETE /lessons/{id}/{recordId}— AgentCore Memory lessons (admin Lessons tab)POST /groundtruth/{id}/upload·GET /groundtruth/{id}·DELETE /groundtruth/{id}— per-layer ground-truth datasetGET /evaluations/{id}·GET /evaluations/{id}/{runId}·POST /evaluations/{id}·DELETE /evaluations/{id}/{runId}— AgentCore evaluation runsGET /monitoring/{id}— per-layer query-resolution + correction-language breakdown (admin Monitoring tab)POST /documents/{id}/upload·GET /documents/{id}·GET/DELETE /documents/{id}/{docId}— supplementary docs → doc-pipeline → KB/metrics— governed-metrics CRUD + lifecycle (Tier 1 authoring); mounted only whenMETRICS_TABLEis set- Streaming chat is served by the Chat AgentCore Gateway (SSE), not this REST API
- Bedrock Guardrails integration:
GuardrailServicepre-screens user inputs (INPUT) before AgentCore invocation and post-screens agent answers (OUTPUT) before storage; blocked queries returnBLOCKEDstatus with the guardrail's canned message GUARDRAIL_IDENTIFIERandGUARDRAIL_VERSIONinjected as Lambda environment variables;bedrock:ApplyGuardrailIAM permission scoped to the guardrail resource- Carries forward Lake Formation admin chain (networking, athena, agent roles)
- Single Bedrock AgentCore Memory resource created via a custom resource (
CreateMemorywith one SemanticStrategy) — backs the lessons-learned feature - Short-term raw-event retention configurable; long-term semantic lessons extracted asynchronously
- Agents write PII-redacted turns via the Strands
LessonsMemoryHooks
- Staged Lambda pipeline for admin-uploaded reference docs: chunk → NER → embed → link → index
- The indexer kicks off a Bedrock KB ingestion job for the per-doc JSONL bundles
- NER stage fails gracefully (
entities=[] + nerError) so downstream stages continue
- MCP Gateway (AgentCore Gateway, CUSTOM_JWT) with inline-schema targets for
ListOntologies,OntologyQuery,MetadataQuery,QuerySuggestions→lambda/mcp-tools - Chat Gateway (CUSTOM_JWT) with AgentCore Runtime targets for browser SSE streaming chat
- Guardrails INPUT/OUTPUT screening per tool call; runtimes invoked over HTTPS with M2M OAuth bearer
- See
assets/guides/MCP_SERVER.md
- HTTP API + stdlib Lambda implementing the MCP OAuth 2.0 flow (RFC 8414/9728 discovery, Authorization Code + PKCE, Dynamic Client Registration)
- Injects the
semantic-layer-mcp/invokegateway scope at/authorizeand forwards authenticated MCP traffic to the MCP Gateway, so Claude Code / Cursor / VS Code can log in via the browser
1. Admin selects tables (including S3 Tables with catalogId)
2. Admin selects "Semantic RAG" type
3. Frontend → POST /metadata/enrich → Lambda → AgentCore Metadata Runtime
4. Metadata Agent (async background, per-table fresh agent):
a. update_glue_database_description(db, description)
b. For each table:
- get_single_table_schema(db, table, catalogId) ← Athena DESCRIBE (catalog-aware)
- sample_table_data(db, table, catalogId) ← live sample rows
- [if uploaded docs] download_document_from_s3 + search/read
- [if annotations] apply targeted hints via ANNOTATION_SYSTEM_PROMPT
- Compose table + column descriptions
- update_glue_table_metadata(...) ← write back to Glue
- save_metadata_document_to_s3(...) ← Markdown to artifacts bucket
- update_progress(jobId, ...) ← DynamoDB tracking
c. _trigger_kb_ingestion() ← start Bedrock KB sync job
5. Frontend polls GET /metadata/enrich/status/{jobId} every 5s
6. On completion: Admin views enriched metadata in ViewSemanticRAGMetadata
Revision flow (versioned re-enrichment):
POST /metadata/revise/{id}/{version} → MetadataService.start_metadata_revision()
→ stamps v1: revisionMode=True, targetVersion=vN, revisionInstructions
→ invokes metadata agent (annotation mode)
→ on completion: _write_versioned_completion() writes immutable vN history record
+ updates v1 pointer (currentVersion=vN, revisionMode=False)
1. Admin selects tables, selects "VKG" type
2. Ontology Agent:
a. get_database_tables(database)
b. For each table:
- get_single_table_schema(database, table)
- retrieve_ontology_patterns(description) → Bedrock KB
- Generate N-QUADS with mapsToTable/mapsToColumn predicates
- persist_to_neptune(nquads) ← via AgentCore Neptune Gateway
- save_ontology_to_s3(turtle, name)
3. Admin views knowledge graph visualization
Browser → Chat AgentCore Gateway (JWT, SSE) → query runtime
→ [Guardrail INPUT] → [inject prior lessons + prior results from AgentCore Memory]
→ Tier 1: governed-metric lookup (embed question → KNN ≥ 0.85)
match → run metric compiled_sql on Athena → return early (metadata.tier=1)
no match ↓
→ Tier 2: Strands graph
Phase 1 topic router → Phase 2 disambiguation (→ clarification if ambiguous)
→ Phase 3 slice builder → Phase 4 generate+validate (SQL or SPARQL)
→ Phase 5 grounding gate → execute on Athena
(VKG: Ontop SPARQL→SQL translate; both: bounded repair retry)
→ stream AG-UI events (phases, tokens, executed SQL, rows) to the browser
→ [Guardrail OUTPUT] → persist turn to chat-sessions (DynamoDB)
→ write PII-redacted turn to AgentCore Memory (lessons mined async)
Per turn the user can submit 👍/👎 feedback (POST /query/feedback).
Blocked queries: a canned guardrail message is returned, no agent reasoning invoked.
semantic-layer/
├── cdk/ # AWS CDK Infrastructure (TypeScript)
│ ├── bin/
│ │ └── app.ts # CDK app entry point — up to 21 stacks
│ └── lib/
│ └── stacks/
│ ├── backend/
│ │ ├── networking-stack.ts
│ │ ├── dynamodb-stack.ts
│ │ ├── glue-catalog-stack.ts
│ │ ├── data-lake-stack.ts
│ │ ├── dynamodb-stream-processor-stack.ts
│ │ ├── neptune-stack.ts
│ │ ├── bedrock-kb-stack.ts
│ │ ├── athena-stack.ts
│ │ ├── agentcore-stack.ts
│ │ ├── agentcore-eval-stack.ts
│ │ ├── agentcore-memory-stack.ts
│ │ ├── doc-pipeline-stack.ts
│ │ ├── mcp-server-stack.ts
│ │ ├── mcp-proxy-stack.ts
│ │ ├── agentcore/
│ │ │ └── neptune-gateway-construct.ts # + Ontop translate Lambda
│ │ ├── zeroetl.ts
│ │ ├── normalized-views-stack.ts
│ │ ├── auth/index.ts
│ │ ├── guardrails/index.ts
│ │ └── lambda-rest-api/index.ts
│ └── frontend/
│ ├── index.ts
│ ├── cloudfront-storage.ts
│ └── provider.ts
│
├── agents/ # Strands AI Agents (Python)
│ ├── ontology_agent/ # VKG ontology generation
│ ├── ontology_query_agent/ # VKG query agent (+ tier2/ workflow, Ontop exec)
│ │ ├── main.py
│ │ └── tier2/ # topic router → … → grounding/execution phases
│ ├── metadata_agent/ # Semantic RAG metadata enrichment
│ ├── metadata_query_agent/ # Semantic RAG query agent (+ tier2/ workflow)
│ │ ├── main.py
│ │ └── tier2/ # router, slice builder, SQL generate/validate
│ ├── query_suggestions_agent/ # Dynamic suggested questions (synchronous)
│ ├── shared/ # Cross-agent building blocks
│ │ ├── tier2_graph.py # shared Strands graph + WorkflowContext
│ │ ├── streaming_runner.py # AG-UI SSE runner + phase sink
│ │ ├── agui_emitter.py # AG-UI event emitter
│ │ ├── chat_sessions.py # DynamoDB transcript store
│ │ ├── memory_hooks.py # AgentCore Memory lessons hooks
│ │ ├── guardrails.py # PII-redaction shim
│ │ ├── eval_trigger.py # emits evaluation.requested on layer completion
│ │ ├── eval_judges.py # custom SESSION LLM-as-Judge factory (GoalSuccess / FAF / SqlGrounded)
│ │ └── knn_index.py / embedding.py / followup.py / prior_results.py / …
│ ├── Dockerfile.* # one per runtime (ontology, ontologyquery, …)
│ └── requirements.txt
│
├── glue/
│ └── create-normalized-views.py # Glue 5.1 PySpark job — 40 Iceberg MVs
│
├── lambda/
│ ├── rest-api/ # FastAPI application
│ │ ├── main.py # app entry point — mounts /ontology /datasource
│ │ │ # /query /metadata /neptune /lessons /feedback
│ │ │ # /documents /groundtruth /evaluations /monitoring
│ │ ├── query_api.py # query + suggestions + chat sessions + feedback
│ │ ├── feedback_api.py · lessons_api.py · evaluations_api.py
│ │ ├── groundtruth_api.py · documents_api.py · monitoring_api.py
│ │ ├── metadata_api.py · ontology_api.py · datasource_api.py · neptune_api.py
│ │ └── services/ # guardrail, chat_session, feedback, evaluation,
│ │ # groundtruth, document, agentcore_memory, metric,
│ │ # monitoring, correction_language, …
│ ├── mcp-tools/ # MCP tool dispatch Lambda (4 tools) — Python ARM64
│ ├── mcp-proxy/ # MCP OAuth 2.0 proxy Lambda (stdlib)
│ ├── ontop-translate/ # Ontop SPARQL→SQL translate Lambda (Java 21)
│ ├── neptune-tools/ # Neptune Gateway tool Lambda
│ ├── doc-pipeline/ # chunk / ner / embedder / linker / indexer
│ ├── dynamodb-stream-processor/ # PyIceberg CDC Lambda
│ ├── dynamodb-iceberg-backfill/ · dlq-processor/ · s3tables-manager/
│
├── frontend/ # React Frontend
│ └── src/
│ ├── pages/
│ │ ├── admin/ # DescribeIntent, SelectDataSources, …,
│ │ │ # ViewKnowledgeGraph, ViewSemanticRAGMetadata,
│ │ │ # GroundTruthDataset, Evaluations, UploadSupplementaryDocs
│ │ └── query/ # AskQuestion, LandingPage, ChatView, ChatTranscript,
│ │ # Composer, ReasoningPanel, ResultPanel, FeedbackBar
│ ├── components/ # FeedbackTab, LessonsLearnedTab, MonitoringTab, GraphVisualization, OntologyEditor
│ ├── hooks/ # useChatStream, useChatSessions, useNotifications
│ └── services/
│ └── api.js
│
├── notebooks/ # AgentCore batch-eval + comparison notebooks
│ ├── 1_metadata_agent_ac_eval.ipynb # + 2..5 per-agent ground-truth evals
│ ├── 6_semantic_rag_vs_vkg.ipynb # RAG vs VKG comparator
│ ├── 7_raw_dynamodb_vs_normalized_s3_eval.ipynb
│ ├── 8_semantic-layer-with-ontology-rag-vs-without_eval.ipynb
│ └── 9_neptune_gateway_testing.ipynb · 10_mcp_server_testing.ipynb
│
├── data/ # Synthetic data + evaluation artifacts
│ ├── complete_synthetic_data/ # generated ACORD sample rows
│ ├── ontology-docs/ · ontology-sources/ # VKG pattern inputs
│ └── eval/
│ ├── groundtruth_dataset.json # 16 GT rows + 4 multi-turn scenarios
│ ├── results/ # raw *_batch_eval_*.json + *_kmean_*.json
│ └── results-analysis/ # dated markdown deep-dives
│
└── scripts/
├── generate_complete_synthetic_data.py
├── load_to_dynamodb.py
└── convert-ontologies.py
- AWS Account with administrator access
- AWS CLI configured with credentials
- Node.js 18+ and npm
- Docker for building agent container images
- AWS CDK CLI v2+
- Python 3.12+
npm install -g aws-cdk
cdk --versionCDK context flags control optional capabilities. Pass them as -c flagName=true /
-c flagName=false on the cdk deploy / cdk synth command line, or set them in cdk.json
under "context". The defaults below are the values committed in
cdk/cdk.json, which is what a bare cdk deploy uses — most flags default to
true, so you disable those with =false (as the capability matrix below shows), not enable
them with =true:
| Flag | Default | Effect |
|---|---|---|
enableOntologyAgents |
true |
VKG mode: Neptune + ontology + ontology-query runtimes + Ontop translate Lambda |
enableSemanticRag |
true |
Semantic-RAG mode: semantic-rag Bedrock KB + metadata, metadata-query, query-suggestions runtimes + /metadata FastAPI sub-app |
enableAcordSampleData |
true |
Loads the 12 ACORD insurance tables with synthetic data on deploy |
enableRealtimeReplication |
false |
DynamoDB Streams → S3 Tables (PyIceberg) CDC pipeline |
enableBatchReplication |
true |
Glue Zero-ETL integrations + normalized-views (alternative to realtime replication) |
enableOboPassthrough |
false |
On-behalf-of (OBO) identity exchange: the REST API swaps the caller's Cognito JWT for short-lived STS creds via AgentCore Identity (fail-closed). Phase-0 rollout — wired + tested but agents do not yet consume the creds; enable only after per-group Lake Formation grants are in place |
Capability matrix:
| Mode | VKG admin | Semantic RAG admin | NL Query (VKG) | NL Query (Semantic RAG) | Sample data |
|---|---|---|---|---|---|
cdk deploy --all (default) |
✅ | ✅ | ✅ | ✅ | ✅ |
+ -c enableAcordSampleData=false |
✅ | ✅ | ✅ | ✅ | ❌ |
-c enableSemanticRag=false |
✅ | ❌ | ✅ | ❌ | ✅ |
-c enableOntologyAgents=false |
❌ | ✅ | ❌ | ✅ | ✅ |
The default cdk deploy --all already loads the synthetic ACORD sample data (both layer types
on). To deploy the schema without the synthetic rows:
cdk deploy --all -c enableAcordSampleData=false
⚠️ Mode opt-out: both layer types are on by default. Pass-c enableSemanticRag=falseto drop thesemantic-ragKB + metadata/metadata-query/query-suggestions runtimes, or-c enableOntologyAgents=falseto drop Neptune + the ontology runtimes. The synthetic ACORD data loader is on by default — opt out with-c enableAcordSampleData=false. Batch replication (Zero-ETL + normalized views) is on by default; real-time CDC replication is off (-c enableRealtimeReplication=trueto enable it).
⚠️ Frontend caveat: React env vars (REACT_APP_ENABLE_SEMANTIC_RAG) are baked at build time. Flipping a flag requires re-runningcdk deployso the frontend is rebuilt.
cdk bootstrap aws://ACCOUNT-ID/REGION# Install CDK dependencies
cd cdk && npm install && npm run build
# Deploy all stacks (17 always-on + up to 4 conditional)
npm run deployUse
npm run deploy, not barecdk deploy --all. Thepredeploynpm hook runs thetscbuild, the CDK Jest suite, and the Python unit suite first, aborting the deploy on any failure. A barenpx cdk deployskips this gate (npm lifecycle hooks only fire fornpm run). See Testing.
Deployment takes approximately 40-60 minutes (includes CodeBuild jobs for ARM64 container images).
- Create Cognito Users: Get User Pool ID from CDK outputs, create an admin user and add it to the
Admingroup (query-only users go inUsers) - Upload Ontology Patterns: Copy VKG design patterns to the Bedrock KB source bucket and trigger ingestion (if using VKG mode)
- Access Application: Get CloudFront URL from stack outputs
- (Optional) Connect an MCP client: follow
assets/guides/MCP_SERVER.mdto add the MCP server to Claude Code / Cursor / VS Code via the OAuth proxy URL
Full details (suite layout, integration-test env vars, troubleshooting) are in
tests/README.md. Quick reference:
# One-time: install test/dev deps (kept out of the Lambda runtime requirements)
pip install -r agents/requirements.txt -r requirements-dev.txt
# Python unit suite (repo root is on sys.path via pyproject.toml)
pytest tests/unit/ -v
# With coverage (matches the CI gate; floor enforced at 66%, ratcheting toward 80%)
mkdir -p data/ontology-docs # placeholder for synth-based tests
pytest tests/unit/ --cov --cov-report=term-missing --cov-fail-under=66Tests are enforced at three points (GitLab CI is the only authoritative gate; the others are bypassable local conveniences):
- GitLab CI (
.gitlab-ci.yml) — runspython-unit(with coverage gate),frontend(Jest), andcdk(Jest) on every push. Integration tests need live AWS and are excluded. Enable "Pipelines must succeed" in the GitLab project to block merges on a red pipeline. - Local
pre-pushhook (.githooks/pre-push) — runs the unit suite before a push; opt in once withgit config core.hooksPath .githooks, bypass withgit push --no-verify. npm run deploygate — thepredeployhook runstsc+ CDK Jest + the Python unit suite before deploying.
Beyond the in-app Evaluations admin tab (which triggers AgentCore on-demand/online runs against
the per-layer ground truth), the repo ships a reproducible offline benchmark suite under
notebooks/ with its datasets, raw results, and written analyses under
data/eval/. These are how the architectural claims in this README were measured.
Each agent/comparison notebook drives Amazon Bedrock AgentCore Batch Evaluations
(bedrock_agentcore.evaluation.BatchEvaluationRunner) server-side against a deployed runtime,
scoring with the custom SESSION LLM-as-judge evaluators (GoalSuccess, FinalAnswerFaithfulness,
SqlGrounded) plus AgentCore built-ins. Run with the project venv kernel (see
notebooks/.env.example for the required layer/runtime IDs).
| Notebook | What it evaluates |
|---|---|
1_metadata_agent_ac_eval |
Metadata generation agent (Semantic RAG enrichment), batch eval |
2_metadata_query_agent_ondemand_groundtruth_eval |
Metadata query agent (Semantic RAG), server-side ground-truth batch eval |
3_evaluation_analyzer |
Turns a batch run's per-query records into a ranked list of prompt fixes |
4_ontology_agent_ac_eval |
Ontology generation agent (VKG), batch eval |
5_ontology_queryagent_ac_eval |
Ontology query agent (VKG), server-side ground-truth batch eval |
6_semantic_rag_vs_vkg |
Side-by-side comparator: Semantic RAG vs. VKG query agents (loads nb2 + nb5 results) |
7_raw_dynamodb_vs_normalized_s3_eval |
Same agent over raw DynamoDB tables vs. normalized S3/Iceberg tables |
8_semantic-layer-with-ontology-rag-vs-without_eval |
VKG built with vs. without ontology-pattern RAG |
9_neptune_gateway_testing |
Neptune AgentCore Gateway tool smoke tests |
10_mcp_server_testing |
Deployed MCP server (Gateway + OAuth proxy) end-to-end smoke tests |
groundtruth_dataset.json— 16 ground-truth rows (Natural_Language_Question/Expected_Answer/Expected_SQL_Query/Expected_SQL_Result) + 4 multi-turn scenarios, shared by all query-agent notebooks.normalized_layer_enrichment_brief.mdis the enrichment brief used to seed the normalized layer.results/— raw per-run JSON emitted by the notebooks:*_batch_eval_*.json(full per-query records) and*_kmean_*.json(k=3 mean/std aggregates, self-contained with generated SQL + ground-truth expectation per scenario).results-analysis/— dated markdown deep-dives that interpret each run.
Query-agent scores are GoalSuccess (the quality signal), k=3 mean over the 16 GT rows + 4
multi-turn scenarios. SqlGrounded reads 1.00 across arms because it scores vacuously on SQL-free
rows — read GoalSuccess, not SqlGrounded.
- Normalization is the dominant accuracy lever — a Semantic-RAG layer over the normalized
S3/Iceberg tables scores ~0.75 GoalSuccess vs. ~0.10 over the raw single-table-design
DynamoDB tables; same agent, same questions, only the data modeling differs
(
2026-06-28-raw-vs-normalized-goalsuccess-zero-analysis.md). - Semantic RAG ≈ VKG on quality; RAG is cheaper + faster — RAG 0.75 vs. VKG 0.71
GoalSuccess (a statistical dead heat), but RAG is ~1.6× faster (26s vs. 41s avg) and uses fewer
tokens. VKG emits SQL on more rows (11/16 vs. 9/16); the two fail on different rows, so they are
complementary (
2026-06-28-rag-vs-vkg.md). - Ontology-pattern RAG is net-neutral for VKG generation — building the VKG with vs.
without the ontology-pattern KB gave 0.71 vs. 0.75 GoalSuccess (within noise) at ~1.3×
the wall-clock (
2026-06-28-vkg-without-vs-vkg with.md). - Opus 4.8 vs. Sonnet 4.6 query-agent swap was within noise — a one-off A/B (since reverted;
production runs Sonnet 4.6) showed +0.08 RAG / +0.02 VKG GoalSuccess, inside the run-to-run band
(
2026-06-29-opus48-vs-sonnet46-model-swap-ab.md).
- Cognito User Pool + Identity Pool. User Pool groups:
Admin(semantic-layer management) andUsers(queries only). Access/ID tokens expire in 1 hour; refresh tokens in 30 days. - Three app clients off one user pool:
- SPA client (public, PKCE Authorization-Code) — the React app; scopes
openid/profile/email - MCP client (public, PKCE Authorization-Code) — Claude Code / Cursor / VS Code via the MCP OAuth proxy; adds the
semantic-layer-mcp/invokeresource-server scope - M2M client (confidential,
client_credentials) — backend service-to-runtime calls; scopesemantic-layer-mcp/invoke, secret in Secrets Manager
- SPA client (public, PKCE Authorization-Code) — the React app; scopes
- Frontend → REST API: Cognito JWT bearer validated by the HTTP API Gateway JWT authorizer (group/
sub/emailclaims extracted); the axios client silently refreshes via Amplify on 401. - Backend → AgentCore runtimes: the runtimes are JWT-inbound (no SigV4). The REST API / mcp-tools Lambda mint an M2M
client_credentialsbearer and invoke over HTTPS; the runtimeCUSTOM_JWTauthorizer validates the user pool +client_id+ scope. - Browser → Chat Gateway → runtime: the Chat AgentCore Gateway (
CUSTOM_JWT) validates the browser's access token (carriesclient_id) and forwards it to the runtime viaJWT_PASSTHROUGHso the runtime sees the end-user identity (e.g. for chat-session ownership). - On-Behalf-Of (OBO) — gated, not yet active: an OBO identity-exchange path (
services/identity_service.py+services/obo_middleware.py) can swap the caller's JWT for short-lived STS credentials via the AgentCore Identity API (fail-closed on exchange error). It is fully implemented + unit-tested and gated byenableOboPassthrough(default off); agents do not yet consume the credentials (they still use the Lambda/runtime service role), so this is a phase-0 rollout pending per-group Lake Formation grants.
- Neptune in private VPC subnets, accessed via AgentCore Neptune Gateway (no public endpoint)
- AgentCore agents access all other services via public AWS endpoints (no VPC required)
- CloudFront DDoS protection; API Gateway rate limiting
- S3 Tables governed by AWS Lake Formation (fine-grained access control)
- All S3 buckets encrypted (SSE-S3); DynamoDB encrypted at rest
- Secrets Manager for sensitive configuration; Parameter Store for endpoints
- IAM least-privilege roles per agent and per stack
- Bedrock Guardrails: content filtering (sexual, violence, hate, insults, misconduct at HIGH strength), PII anonymization (address), denied topics (financial/legal advice), profanity word list
GuardrailServicein Lambda REST API applies guardrails on every natural language query:- INPUT pre-screen: user question evaluated before invoking any agent (fail-fast — saves compute on blocked queries)
- OUTPUT post-screen: agent answer evaluated before return
- Blocked queries return
BLOCKEDstatus with guardrail's canned message; fail-open onApplyGuardrailAPI errors
- PII redaction on persisted user content: feedback comments and AgentCore Memory turns are PII-redacted by Guardrails before they are written to DynamoDB / Memory
- MCP tool calls are screened by the same INPUT/OUTPUT guardrails in the
mcp-toolsLambda - CloudWatch logs for all agent invocations; AgentCore online + on-demand evaluation for answer-quality regression tracking
The following table provides a sample cost breakdown for deploying this solution in the US East (N. Virginia) Region for one month.
- 50 monthly active users (MAU)
- 5,000 Lambda invocations/month for the FastAPI REST API (avg 10s duration, 2048MB memory)
- 12,000 CDC stream processor Lambda invocations/month (12 tables × ~1K DynamoDB events each)
- 5 AgentCore Runtimes: 200 sessions/month total (avg 10 min each) = 2,000 runtime minutes
- Amazon Bedrock: ~8M input tokens, ~1M output tokens/month across all agents + KB RAG. Two model tiers: the build-time generation agents (ontology, metadata) run on Claude Opus 4.8; the query-time agents (ontology-query, metadata-query, query-suggestions) run on Claude Sonnet 4.6. The Opus agents are low-frequency (only on layer create/revise), so most token volume is Sonnet.
- Amazon Neptune: db.t3.medium cluster, ~10 GB RDF/OWL storage (VKG mode)
- Amazon Bedrock Knowledge Base: 2 KBs (S3 Vectors backend) — ontology-patterns + semantic-rag; Titan Embed Text v2 embeddings; ~0.2 GB vector storage, ~500 queries/month
- DynamoDB: 13 tables (12 insurance + 1 metadata), on-demand capacity, ~5 GB storage, ~100K R/W operations
- S3: 6 buckets (website, artifacts, Athena results, KB source, logging, S3 Tables/Iceberg), ~50 GB total
- CloudFront: 5 GB data transfer out, 1M HTTPS requests/month
- Athena: 1,000 queries/month, ~50 GB scanned; DynamoDB connector (Lambda-based)
- Glue: 2 databases, 2 crawlers, 12 Zero-ETL integrations (DynamoDB → S3 Tables), 1 Glue 5.1 job (40 MVs, runs every 6h ~20 DPU-minutes per run)
- Ontop translate Lambda (VKG): Java 21 ARM64, 2048 MB, provisioned concurrency = 1 (one always-warm JVM to avoid cold starts on the SPARQL→SQL path) — a small standing cost even at zero queries
- ECR: ~12 container images (5 agents, FastAPI Lambda, CDC stream processor, mcp-tools, mcp-proxy, ontop-translate, neptune-tools, doc-pipeline)
- CodeBuild: ARM64 builds — ~12 images × a few builds/month × ~15 min avg
- AgentCore Memory: 1 resource (SemanticStrategy), low event volume
- AgentCore Evaluation: online sampling + occasional on-demand batch runs
- CloudWatch: 5 GB log ingestion/month across all agents, Lambda, and Neptune
- VPC: 1 NAT Gateway (Neptune private subnet) + VPC interface endpoints
| AWS Service | Dimensions | Cost [USD] |
|---|---|---|
| AWS Lambda | FastAPI API (5K inv, 10s, 2048MB) + CDC stream processor (12K inv, 3s, 512MB) + misc | ~$3.00 |
| AWS Lambda — Ontop (VKG) | Ontop SPARQL→SQL translate (Java 21 ARM64, 2048MB) with provisioned concurrency = 1 (always-warm JVM) — standing cost even at idle | ~$14.00 |
| Amazon API Gateway | HTTP API with JWT authorizer (Cognito), ~50K requests/month | ~$0.10 |
| Amazon Bedrock AgentCore Runtime | 5 runtimes (ontology, ontology-query, metadata, metadata-query, query-suggestions), 200 sessions/month × 10 min avg = 2,000 minutes | ~$40.00 |
| Amazon Bedrock (Models) | Opus 4.8 ($5/$25 per M, build-time agents) + Sonnet 4.6 ($3/$15 per M, query-time agents): ~8M input, ~1M output/month (~15% Opus blend) | ~$43.00 |
| Amazon Neptune | db.t3.medium cluster (VKG mode), ~10 GB RDF/OWL storage, SPARQL endpoint in private VPC | ~$50.00 |
| Amazon Cognito | 50 MAU, Essentials tier with security features | ~$6.50 |
| Amazon DynamoDB | 13 tables (on-demand), 5 GB storage, ~100K R/W operations, Streams enabled for CDC | ~$2.50 |
| Amazon S3 & S3 Tables | 6 buckets (website, artifacts, Athena results, KB source, logging) + S3 Tables (Apache Iceberg, 12 tables), ~50 GB total | ~$3.00 |
| Amazon CloudFront | 5 GB data transfer out, 1M HTTPS requests/month | ~$1.00 |
| Amazon Bedrock Knowledge Base | 2 KBs (S3 Vectors backend): ontology-patterns + semantic-rag; Titan Embed Text v2 embeddings; ~0.2 GB vector storage, ~500 queries/month | ~$5.00 |
| Amazon Athena | 1K queries/month, ~50 GB scanned; DynamoDB Connector (Lambda-based federated catalog) | ~$1.00 |
| AWS Glue | Data Catalog (2 databases, AI-enriched), 2 crawlers, 12 Zero-ETL integrations, Glue 5.1 job (40 MVs, 4×/day) | ~$10.00 |
| Amazon ECR | ~12 container images: 5 agents + FastAPI + CDC + mcp-tools + mcp-proxy + ontop-translate + neptune-tools + doc-pipeline | ~$2.50 |
| AWS CodeBuild | ARM64 builds: ~12 images × a few builds/month × ~15 min avg | ~$6.00 |
| Amazon Bedrock AgentCore Memory | 1 memory resource (SemanticStrategy); lessons mined from chat sessions, low event volume | ~$2.00 |
| Amazon Bedrock AgentCore Eval | Online sampling on query runtimes + occasional on-demand batch runs vs. ground-truth | ~$3.00 |
| Amazon CloudWatch | 5 GB log ingestion (agents, Lambda, Neptune), metrics, OpenTelemetry traces | ~$3.00 |
| AWS Secrets Manager | 5 secrets (Neptune credentials, API keys, Bedrock config) | ~$2.00 |
| Amazon VPC / NAT Gateway | 1 NAT Gateway (Neptune private subnet), VPC interface endpoints | ~$35.00 |
| Amazon Bedrock Guardrails | Content filtering + PII detection/redaction, ~500 text units/month | ~$1.00 |
| SSM Parameter Store | Standard parameters (service endpoints, ARNs) | Free |
| Total (Moderate Usage) | Monthly Cost for All Services | ~$236 |
Note — Cost Drivers: Amazon Neptune is the largest single cost (~$50/month for a db.t3.medium cluster). Amazon Bedrock AgentCore Runtime (5 runtimes, ~$40/month) and Bedrock model inference (Opus 4.8 at $5/$25 per M tokens for the build-time ontology/metadata agents, Sonnet 4.6 at $3/$15 per M for the query-time agents,
$43/month combined assuming ~15% of token volume is the low-frequency Opus build path) are the next largest. The Ontop translate Lambda's provisioned concurrency = 1 adds ~$14/month standing cost in VKG mode (one always-warm JVM) regardless of query volume — drop it to on-demand (cold starts) or set PC=0 if that trade-off is acceptable. AgentCore Memory and Evaluation add a few dollars at low volume. The Knowledge Base backend uses S3 Vectors (not OpenSearch Serverless), keeping vector storage and query costs negligible ($5/month for both KBs combined). For intermittent or dev/test VKG usage, Neptune Serverless can reduce the Neptune cost further.
This library is licensed under the MIT-0 License. See the LICENSE file.
See CONTRIBUTING for more information.
- Felix Huthmacher, Senior Applied AI Architect github - fhuthmacher

