Database Modernizer Assessment

Disclaimer: This is a sample project intended for educational and evaluation purposes. It requires proper review, testing, and modification before use in production environments. Use at your own risk.

Modernizing off a monolithic relational database is hard. Which queries belong in DynamoDB? Which need a document store? What stays relational? Getting it wrong means failed modernizations, re-architecture mid-project, and wasted months.

Database Modernizer Assessment answers that question automatically. Point it at your PostgreSQL or MySQL database, and it analyzes every query pattern, scores each one against 6 AWS purpose-built engines, validates the architecture, and produces ready-to-implement schema designs with TCO projections.

Supported sources: PostgreSQL, MySQL, MariaDB Target engines: DynamoDB, DocumentDB, ElastiCache/Redis, OpenSearch, Aurora PostgreSQL, Aurora MySQL

Who is this for?

This tool is for teams that have decided to refactor their application to use purpose-built databases. It helps you figure out which queries go where and what the target schemas should look like.

Who is this NOT for?

Lift-and-shift migrations: If you're moving a database as-is to RDS or Aurora without changing the data model, you don't need this tool.
Tight deadline migrations: This tool guides application refactoring, which takes time. If you need to migrate by next week, use AWS DMS for a straight move.
Teams that haven't committed to refactoring: If you're still deciding whether to modernize, start with the AWS Migration Evaluator or a Well-Architected review first.

How It Works

The modernizer runs a multi-phase pipeline that progressively narrows from "all possible targets" to a concrete, validated modernization architecture:

Collect --> Triage --> Analyze --> Assign --> Reality Check --> Schema Design --> Synthesis

Phase	What it does
Collect	Connects to the source database (or parses offline output), extracts schema + query patterns
Triage	Detects workload signals (key-value lookups, text search, time-series, etc.) and selects candidate engines
Analyze	Runs parallel analysis agents per engine, deterministic scoring + optional LLM advisor
Assign	Resolves query-to-engine assignments using confidence scores and co-dependency analysis
Reality Check	Consolidates under-committed engines, validates with LLM, redirects unserviceable queries
Schema Design	Designs target schemas per engine (DynamoDB tables, DocumentDB collections, OpenSearch indices, etc.)
Synthesis	Produces the final migration assessment report with TCO, risk analysis, and recommendations

The core pipeline through Reality Check is fully deterministic. Pattern detection, scoring, assignment, and consolidation all run without any LLM dependency. GenAI enhances the pipeline at key decision points (Schema Design, Synthesis executive summaries) but the analysis and recommendations are reproducible and auditable every time.

Getting Started

Prerequisites

Python 3.12+
uv (Python package manager)

Install

git clone https://github.com/aws-samples/sample-aws-genai-db-modernizer.git
cd sample-aws-genai-db-modernizer
uv sync

That's it. No AWS account, no API keys, no Docker required.

Usage

The project includes sample databases you can run immediately. Two collector outputs are provided:

docs/examples/wordpress/wordpress.zip WordPress + WooCommerce (50 tables, 107 queries)
docs/examples/discourse/discourse.zip Discourse forum (170+ tables, 500+ queries)

Unzip whichever you want to try:

unzip docs/examples/wordpress/wordpress.zip -d docs/examples/wordpress/

Option 1: Deterministic Mode (zero config)

Run the assessment pipeline with no credentials, no LLM, no network calls. This executes Triage through Reality Check and produces architecture recommendations in seconds:

uv run python scripts/run_assessment.py --file docs/examples/wordpress/wordpress-collection.json --db wordpress

Artifacts land in ./artifacts/{db_name}/{job_id}/.

What you get without LLM:

Workload signal detection (key-value, text search, aggregations, session stores, etc.)
Per-engine analysis with confidence scores for every query
Query-to-engine assignment with co-dependency resolution
Reality Check: engine consolidation, architectural pattern detection, cost savings analysis

Option 2: Full Pipeline with Claude Code

If you have Claude Code installed, you already have everything needed for the full pipeline, including AI-powered Schema Design and Synthesis. No AWS account required.

Use the built-in Claude Code commands for an interactive experience:

/modernize           # Run the full pipeline end-to-end
/collect             # Parse collector output and initialize a job
/triage              # Select target engines based on workload signals
/analyze             # Run analysis for all selected engines
/assign              # Assign queries to best-fit engines
/reality-check       # Consolidate engines and validate decisions
/design-schema       # Design target schemas (LLM required)
/synthesize          # Generate the final migration report

These commands are defined in .claude/commands/ and available automatically when you open the project in Claude Code.

What you get with Claude Code:

Everything from deterministic mode, plus:
Target schema designs (DynamoDB table definitions, DocumentDB collections, OpenSearch mappings, etc.)
Full migration assessment report with executive summary
TCO projections and risk analysis

Option 3: Full Pipeline with Amazon Bedrock

For production use, automation, or running without Claude Code, use Amazon Bedrock as the LLM backend:

uv run python scripts/run_assessment.py --file docs/examples/wordpress/wordpress-collection.json --db wordpress --llm-mode bedrock --all -y

AWS setup required:

Configure AWS credentials (any standard method works):

# Option A: AWS CLI profile
aws configure

# Option B: Environment variables
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1

# Option C: AWS SSO
aws sso login --profile your-profile
export AWS_PROFILE=your-profile

Enable model access in Amazon Bedrock console:
- Enable Anthropic Claude Sonnet (used for Reality Check validation)
- Enable Anthropic Claude Opus (used for Schema Design and Synthesis)

Run with Bedrock:

uv run python scripts/run_assessment.py --file docs/examples/wordpress/wordpress-collection.json --db wordpress --llm-mode bedrock --all -y

Option 4: Analyze Your Own Database

To analyze your own database, run the collection script to extract schema and query patterns:

PostgreSQL (requires pg_stat_statements extension):

psql -U <user> -h <host> -d <database> -t -A -f scripts/collect-postgresql.sql > my-collection.json

MySQL (requires performance_schema):

mysql -N -u <user> -p -h <host> -D <database> < scripts/collect-mysql.sql > my-collection.json

Then run the pipeline against your collection:

uv run python scripts/run_assessment.py --file my-collection.json --db my_database --llm-mode bedrock --all -y

Note: The collection scripts are read-only and do not modify your database. They need SELECT access to information_schema, pg_stat_statements (PostgreSQL), or performance_schema (MySQL).

LLM Modes Summary

Mode	Credentials needed	Phases covered	Best for
`none`	None	Collect through Reality Check	Quick evaluation, CI/CD, deterministic audits
`external`	Claude Code license	Full pipeline	Local development, interactive exploration
`bedrock`	AWS credentials + Bedrock access	Full pipeline	Production, automation, team use

Key Design Patterns

LLM Seam Pattern

Every agent exposes three methods:

run_deterministic() always runs, produces baseline results
prepare_llm_input() formats context for the LLM
apply_llm_output() merges LLM feedback into deterministic results

This allows the pipeline to run fully deterministic (--llm-mode none) or with LLM enhancement (--llm-mode bedrock).

Group Splitting

Large workloads (1000+ queries) exceed LLM context windows. The LlmAdvisorBase automatically:

Splits queries into groups of 30
Filters schema to only tables referenced per group
Calls the LLM per group with retry + exponential backoff
Merges results across groups

Reality Check & Consolidation

After assignment, the referee identifies under-committed engines (few queries, low confidence) and consolidates them into stronger engines. An LLM validator confirms the target can serve the moved queries. If not, they redirect to Aurora (the relational safety net).

Contract-Driven

All agent I/O flows through Pydantic contracts (src/contracts/). This enables:

Automated contract validation in CI
Deterministic replay of any pipeline stage
Clear boundaries between pipeline phases

Development

# Install with dev dependencies
uv sync --extra dev

# Run all tests
uv run pytest tests/ -v --cov=src

# Run specific suites
uv run pytest tests/unit/ -v
uv run pytest tests/contract/ -v
uv run pytest tests/integration/ -v

# Code quality
uv run black src/ tests/
uv run ruff check src/ tests/
uv run mypy src/

# Full dev setup (pre-commit hooks, cfn-nag, etc.)
./scripts/setup_dev.sh

Running Individual Phases

# Assessment only (phases 1-5, stops after reality-check):
uv run python scripts/run_assessment.py --file <collector-output.json> --db <name>

# Resume after providing LLM response (external mode):
uv run python scripts/run_assessment.py --job-id <id> --db <name> --resume-reality-check

# Full pipeline including schema design + synthesis:
uv run python scripts/run_assessment.py --file <collector-output.json> --db <name> --llm-mode bedrock --all -y

Local Web UI

Run the React web interface locally to visualize results, browse query journeys, and review schema designs:

# Start the API server
STORAGE_TYPE=local ARTIFACT_ROOT=./artifacts uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000

# Build and serve the UI (in another terminal)
cd src/ui && npm install
REACT_APP_API_URL=http://localhost:8000/api/v1/ npx react-scripts build
npx serve -s build -l 3000

Then open http://localhost:3000 to browse your modernization results.

Cloud Deployment (AWS)

Deploy the full platform on AWS with ECS Fargate, Step Functions orchestration, and Cognito authentication:

cp .env.example .env       # Edit with your domain and region
make deploy-dns            # Deploy Route 53 + ACM certificate (one-time)
# Add NS records to your parent domain, wait for cert validation
make deploy-infra          # Deploy VPC, ECR, KMS
make build                 # Build and push Docker images (requires Docker Desktop)
make deploy-services       # Deploy ECS, ALB, Cognito, S3, Step Functions
make create-test-user      # Create a Cognito login

See Deployment Guide for full details and troubleshooting.

Project Structure

src/
  agents/           # Pipeline agents (collector, analysis, referee, schema_design)
  contracts/        # Pydantic I/O contracts between phases
  orchestrator/     # Local and Step Functions orchestrators
  storage/          # Artifact store (S3 or local filesystem)
  tools/            # Analysis tools, scoring, pattern catalogs
  api/              # FastAPI backend
  ui/               # React frontend
scripts/            # CLI entry points and collection scripts
infrastructure/     # CloudFormation templates for AWS deployment
docs/               # Architecture docs, contracts, guides
tests/              # Unit, contract, and integration tests

Documentation

Document	Description
Architecture	System architecture and decisions
Agent Contracts	Pydantic I/O specifications
API Guide	REST API reference
Implementation Guides	Development patterns

Contributing

Review agent contracts
Pick a component from implementation guides
Follow TDD: contracts → tests → implementation
Submit PR with tests and documentation

See CONTRIBUTING.md for details.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.claude/commands		.claude/commands
.github		.github
docs		docs
infrastructure		infrastructure
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.gitmessage		.gitmessage
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.ruby-version		.ruby-version
.secrets.baseline		.secrets.baseline
.semgrepignore		.semgrepignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Database Modernizer Assessment

Who is this for?

Who is this NOT for?

How It Works

Getting Started

Prerequisites

Install

Usage

Option 1: Deterministic Mode (zero config)

Option 2: Full Pipeline with Claude Code

Option 3: Full Pipeline with Amazon Bedrock

Option 4: Analyze Your Own Database

LLM Modes Summary

Key Design Patterns

LLM Seam Pattern

Group Splitting

Reality Check & Consolidation

Contract-Driven

Development

Running Individual Phases

Local Web UI

Cloud Deployment (AWS)

Project Structure

Documentation

Contributing

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages