Disclaimer: This is a sample project intended for educational and evaluation purposes. It requires proper review, testing, and modification before use in production environments. Use at your own risk.
Modernizing off a monolithic relational database is hard. Which queries belong in DynamoDB? Which need a document store? What stays relational? Getting it wrong means failed modernizations, re-architecture mid-project, and wasted months.
Database Modernizer Assessment answers that question automatically. Point it at your PostgreSQL or MySQL database, and it analyzes every query pattern, scores each one against 6 AWS purpose-built engines, validates the architecture, and produces ready-to-implement schema designs with TCO projections.
Supported sources: PostgreSQL, MySQL, MariaDB Target engines: DynamoDB, DocumentDB, ElastiCache/Redis, OpenSearch, Aurora PostgreSQL, Aurora MySQL
This tool is for teams that have decided to refactor their application to use purpose-built databases. It helps you figure out which queries go where and what the target schemas should look like.
- Lift-and-shift migrations: If you're moving a database as-is to RDS or Aurora without changing the data model, you don't need this tool.
- Tight deadline migrations: This tool guides application refactoring, which takes time. If you need to migrate by next week, use AWS DMS for a straight move.
- Teams that haven't committed to refactoring: If you're still deciding whether to modernize, start with the AWS Migration Evaluator or a Well-Architected review first.
The modernizer runs a multi-phase pipeline that progressively narrows from "all possible targets" to a concrete, validated modernization architecture:
Collect --> Triage --> Analyze --> Assign --> Reality Check --> Schema Design --> Synthesis
| Phase | What it does |
|---|---|
| Collect | Connects to the source database (or parses offline output), extracts schema + query patterns |
| Triage | Detects workload signals (key-value lookups, text search, time-series, etc.) and selects candidate engines |
| Analyze | Runs parallel analysis agents per engine, deterministic scoring + optional LLM advisor |
| Assign | Resolves query-to-engine assignments using confidence scores and co-dependency analysis |
| Reality Check | Consolidates under-committed engines, validates with LLM, redirects unserviceable queries |
| Schema Design | Designs target schemas per engine (DynamoDB tables, DocumentDB collections, OpenSearch indices, etc.) |
| Synthesis | Produces the final migration assessment report with TCO, risk analysis, and recommendations |
The core pipeline through Reality Check is fully deterministic. Pattern detection, scoring, assignment, and consolidation all run without any LLM dependency. GenAI enhances the pipeline at key decision points (Schema Design, Synthesis executive summaries) but the analysis and recommendations are reproducible and auditable every time.
- Python 3.12+
- uv (Python package manager)
git clone https://github.com/aws-samples/sample-aws-genai-db-modernizer.git
cd sample-aws-genai-db-modernizer
uv syncThat's it. No AWS account, no API keys, no Docker required.
The project includes sample databases you can run immediately. Two collector outputs are provided:
docs/examples/wordpress/wordpress.zipWordPress + WooCommerce (50 tables, 107 queries)docs/examples/discourse/discourse.zipDiscourse forum (170+ tables, 500+ queries)
Unzip whichever you want to try:
unzip docs/examples/wordpress/wordpress.zip -d docs/examples/wordpress/Run the assessment pipeline with no credentials, no LLM, no network calls. This executes Triage through Reality Check and produces architecture recommendations in seconds:
uv run python scripts/run_assessment.py --file docs/examples/wordpress/wordpress-collection.json --db wordpressArtifacts land in ./artifacts/{db_name}/{job_id}/.
What you get without LLM:
- Workload signal detection (key-value, text search, aggregations, session stores, etc.)
- Per-engine analysis with confidence scores for every query
- Query-to-engine assignment with co-dependency resolution
- Reality Check: engine consolidation, architectural pattern detection, cost savings analysis
If you have Claude Code installed, you already have everything needed for the full pipeline, including AI-powered Schema Design and Synthesis. No AWS account required.
Use the built-in Claude Code commands for an interactive experience:
/modernize # Run the full pipeline end-to-end
/collect # Parse collector output and initialize a job
/triage # Select target engines based on workload signals
/analyze # Run analysis for all selected engines
/assign # Assign queries to best-fit engines
/reality-check # Consolidate engines and validate decisions
/design-schema # Design target schemas (LLM required)
/synthesize # Generate the final migration report
These commands are defined in .claude/commands/ and available automatically when you open the project in Claude Code.
What you get with Claude Code:
- Everything from deterministic mode, plus:
- Target schema designs (DynamoDB table definitions, DocumentDB collections, OpenSearch mappings, etc.)
- Full migration assessment report with executive summary
- TCO projections and risk analysis
For production use, automation, or running without Claude Code, use Amazon Bedrock as the LLM backend:
uv run python scripts/run_assessment.py --file docs/examples/wordpress/wordpress-collection.json --db wordpress --llm-mode bedrock --all -yAWS setup required:
-
Configure AWS credentials (any standard method works):
# Option A: AWS CLI profile aws configure # Option B: Environment variables export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... export AWS_DEFAULT_REGION=us-east-1 # Option C: AWS SSO aws sso login --profile your-profile export AWS_PROFILE=your-profile
-
Enable model access in Amazon Bedrock console:
- Enable Anthropic Claude Sonnet (used for Reality Check validation)
- Enable Anthropic Claude Opus (used for Schema Design and Synthesis)
-
Run with Bedrock:
uv run python scripts/run_assessment.py --file docs/examples/wordpress/wordpress-collection.json --db wordpress --llm-mode bedrock --all -y
To analyze your own database, run the collection script to extract schema and query patterns:
PostgreSQL (requires pg_stat_statements extension):
psql -U <user> -h <host> -d <database> -t -A -f scripts/collect-postgresql.sql > my-collection.jsonMySQL (requires performance_schema):
mysql -N -u <user> -p -h <host> -D <database> < scripts/collect-mysql.sql > my-collection.jsonThen run the pipeline against your collection:
uv run python scripts/run_assessment.py --file my-collection.json --db my_database --llm-mode bedrock --all -yNote: The collection scripts are read-only and do not modify your database. They need SELECT access to
information_schema,pg_stat_statements(PostgreSQL), orperformance_schema(MySQL).
| Mode | Credentials needed | Phases covered | Best for |
|---|---|---|---|
none |
None | Collect through Reality Check | Quick evaluation, CI/CD, deterministic audits |
external |
Claude Code license | Full pipeline | Local development, interactive exploration |
bedrock |
AWS credentials + Bedrock access | Full pipeline | Production, automation, team use |
Every agent exposes three methods:
run_deterministic()always runs, produces baseline resultsprepare_llm_input()formats context for the LLMapply_llm_output()merges LLM feedback into deterministic results
This allows the pipeline to run fully deterministic (--llm-mode none) or with LLM enhancement (--llm-mode bedrock).
Large workloads (1000+ queries) exceed LLM context windows. The LlmAdvisorBase automatically:
- Splits queries into groups of 30
- Filters schema to only tables referenced per group
- Calls the LLM per group with retry + exponential backoff
- Merges results across groups
After assignment, the referee identifies under-committed engines (few queries, low confidence) and consolidates them into stronger engines. An LLM validator confirms the target can serve the moved queries. If not, they redirect to Aurora (the relational safety net).
All agent I/O flows through Pydantic contracts (src/contracts/). This enables:
- Automated contract validation in CI
- Deterministic replay of any pipeline stage
- Clear boundaries between pipeline phases
# Install with dev dependencies
uv sync --extra dev
# Run all tests
uv run pytest tests/ -v --cov=src
# Run specific suites
uv run pytest tests/unit/ -v
uv run pytest tests/contract/ -v
uv run pytest tests/integration/ -v
# Code quality
uv run black src/ tests/
uv run ruff check src/ tests/
uv run mypy src/
# Full dev setup (pre-commit hooks, cfn-nag, etc.)
./scripts/setup_dev.sh# Assessment only (phases 1-5, stops after reality-check):
uv run python scripts/run_assessment.py --file <collector-output.json> --db <name>
# Resume after providing LLM response (external mode):
uv run python scripts/run_assessment.py --job-id <id> --db <name> --resume-reality-check
# Full pipeline including schema design + synthesis:
uv run python scripts/run_assessment.py --file <collector-output.json> --db <name> --llm-mode bedrock --all -yRun the React web interface locally to visualize results, browse query journeys, and review schema designs:
# Start the API server
STORAGE_TYPE=local ARTIFACT_ROOT=./artifacts uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000
# Build and serve the UI (in another terminal)
cd src/ui && npm install
REACT_APP_API_URL=http://localhost:8000/api/v1/ npx react-scripts build
npx serve -s build -l 3000Then open http://localhost:3000 to browse your modernization results.
Deploy the full platform on AWS with ECS Fargate, Step Functions orchestration, and Cognito authentication:
cp .env.example .env # Edit with your domain and region
make deploy-dns # Deploy Route 53 + ACM certificate (one-time)
# Add NS records to your parent domain, wait for cert validation
make deploy-infra # Deploy VPC, ECR, KMS
make build # Build and push Docker images (requires Docker Desktop)
make deploy-services # Deploy ECS, ALB, Cognito, S3, Step Functions
make create-test-user # Create a Cognito loginSee Deployment Guide for full details and troubleshooting.
src/
agents/ # Pipeline agents (collector, analysis, referee, schema_design)
contracts/ # Pydantic I/O contracts between phases
orchestrator/ # Local and Step Functions orchestrators
storage/ # Artifact store (S3 or local filesystem)
tools/ # Analysis tools, scoring, pattern catalogs
api/ # FastAPI backend
ui/ # React frontend
scripts/ # CLI entry points and collection scripts
infrastructure/ # CloudFormation templates for AWS deployment
docs/ # Architecture docs, contracts, guides
tests/ # Unit, contract, and integration tests
| Document | Description |
|---|---|
| Architecture | System architecture and decisions |
| Agent Contracts | Pydantic I/O specifications |
| API Guide | REST API reference |
| Implementation Guides | Development patterns |
- Review agent contracts
- Pick a component from implementation guides
- Follow TDD: contracts → tests → implementation
- Submit PR with tests and documentation
See CONTRIBUTING.md for details.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.

