diff --git a/plugins/databases-on-aws/skills/dsql/SKILL.md b/plugins/databases-on-aws/skills/dsql/SKILL.md index a0a79d7c..ea19b2a5 100644 --- a/plugins/databases-on-aws/skills/dsql/SKILL.md +++ b/plugins/databases-on-aws/skills/dsql/SKILL.md @@ -1,9 +1,9 @@ --- name: dsql -description: "Build with Aurora DSQL — manage schemas, execute queries, handle migrations, diagnose query plans, and develop applications with a serverless, distributed SQL database. Covers IAM auth, multi-tenant patterns, MySQL-to-DSQL migration, DDL operations, query plan explainability, and SQL compatibility validation. Triggers on phrases like: DSQL, Aurora DSQL, create DSQL table, DSQL schema, migrate to DSQL, distributed SQL database, serverless PostgreSQL-compatible database, DSQL query plan, DSQL EXPLAIN ANALYZE, why is my DSQL query slow." +description: "Build with Aurora DSQL — manage schemas, execute queries, handle migrations, diagnose query plans, and develop applications with a serverless, distributed SQL database. Covers IAM auth, multi-tenant patterns, MySQL-to-DSQL migration, PostgreSQL-to-DSQL schema conversion, PL/pgSQL transpilation, FK replacement code generation, OCC retry patterns, ORM migration (Django/Hibernate/Rails), DDL operations, query plan explainability, and SQL compatibility validation. Triggers on phrases like: DSQL, Aurora DSQL, create DSQL table, DSQL schema, migrate to DSQL, convert to DSQL, PostgreSQL to DSQL, distributed SQL database, serverless PostgreSQL-compatible database, DSQL query plan, DSQL EXPLAIN ANALYZE, why is my DSQL query slow, DSQL ENUM, DSQL foreign key, DSQL PL/pgSQL, DSQL trigger, DSQL OCC retry, DSQL Django, DSQL Hibernate, DSQL Rails, DSQL multi-region, DSQL JSONB, DSQL index async, DSQL GIN index." license: Apache-2.0 metadata: - tags: aws, aurora, dsql, distributed-sql, distributed, distributed-database, database, serverless, serverless-database, postgresql, postgres, sql, schema, migration, multi-tenant, iam-auth, aurora-dsql, mcp, orm + tags: aws, aurora, dsql, distributed-sql, distributed, distributed-database, database, serverless, serverless-database, postgresql, postgres, sql, schema, migration, multi-tenant, iam-auth, aurora-dsql, mcp, orm, plpgsql, trigger, enum, foreign-key, occ-retry, django, hibernate, rails, multi-region, schema-conversion, type-mapping --- # Amazon Aurora DSQL Skill @@ -109,6 +109,70 @@ sampled in [.mcp.json](../../.mcp.json) **When:** Load when migrating a complete MySQL table to DSQL **Contains:** End-to-end MySQL CREATE TABLE migration example with decision summary +### PostgreSQL Migration (modular): + +#### [pg-migrations/type-mapping.md](references/pg-migrations/type-mapping.md) + +**When:** MUST load when migrating PostgreSQL schemas to DSQL or answering type mapping questions +**Contains:** Complete PostgreSQL → DSQL type mapping (50+ types), COLLATE "C" rules, NUMERIC precision guidance, JSON/JSONB behavior, array alternatives, migration decision matrix + +#### [pg-migrations/plpgsql-patterns.md](references/pg-migrations/plpgsql-patterns.md) + +**When:** MUST load when converting PL/pgSQL functions or triggers to DSQL-compatible SQL +**Contains:** 10 transpilation patterns with before/after code, detection signals, app-responsibility notes, unconvertible pattern stubs + +#### [pg-migrations/fk-replacement.md](references/pg-migrations/fk-replacement.md) + +**When:** MUST load when generating FK validation functions or cascade replacement code +**Contains:** validate_fk_*() templates, cascade function templates, tenant-scoped validation, ORM integration patterns (Django/SQLAlchemy/Spring) + +#### [pg-migrations/index-conversion.md](references/pg-migrations/index-conversion.md) + +**When:** MUST load when resolving `dsql_lint` unfixable index diagnostics (index_using, index_partial, index_expression) +**Contains:** GIN/GiST/BRIN → btree conversion, partial index removal, expression index → computed column patterns + +#### [pg-migrations/schema-objects.md](references/pg-migrations/schema-objects.md) + +**When:** MUST load when converting ENUM types, materialized views, extensions, roles/grants, or handling multi-schema flattening +**Contains:** ENUM → CHECK, composite types → json, materialized views → views, extension alternatives, role/IAM mapping, multi-schema consolidation + +#### [pg-migrations/function-compatibility.md](references/pg-migrations/function-compatibility.md) + +**When:** Load when checking if a PostgreSQL function works in DSQL or finding replacements +**Contains:** Supported/unsupported function matrix, uuid_generate_v4() → gen_random_uuid(), lastval() → currval(), COPY → batched INSERT, maintenance command removal + +#### [pg-migrations/occ-retry-patterns.md](references/pg-migrations/occ-retry-patterns.md) + +**When:** MUST load when generating OCC retry code for any language +**Contains:** Retry strategy, Python/Node.js/Java/Go implementations, conflict mitigation, idempotent transaction design + +#### [pg-migrations/data-migration.md](references/pg-migrations/data-migration.md) + +**When:** Load when planning data migration from PostgreSQL to DSQL +**Contains:** Migration order, COPY → batched INSERT patterns, Python/Node.js loaders, sequence alignment, validation queries, pre-flight checklist + +#### [pg-migrations/multi-region.md](references/pg-migrations/multi-region.md) + +**When:** Load when user asks about multi-region DSQL, active-active, or high availability +**Contains:** Architecture, schema deployment, geographic partitioning, OCC cross-region behavior, performance considerations + +### ORM Migration Guides: + +#### [orm-guides/django.md](references/orm-guides/django.md) + +**When:** Load when migrating a Django application to DSQL +**Contains:** aurora-dsql-django adapter setup, model changes, migration patterns, OCC retry decorator + +#### [orm-guides/hibernate.md](references/orm-guides/hibernate.md) + +**When:** Load when migrating a Java/Spring Boot application to DSQL +**Contains:** Hibernate dialect, HikariCP config, entity changes, Spring Retry, Liquibase patterns + +#### [orm-guides/rails.md](references/orm-guides/rails.md) + +**When:** Load when migrating a Ruby on Rails application to DSQL +**Contains:** IAM token initializer, model associations without FK, async indexes, OCC retry concern + ### Query Plan Explainability (modular): **When:** MUST load all four at Workflow 8 Phase 0 — [query-plan/plan-interpretation.md](references/query-plan/plan-interpretation.md), [query-plan/catalog-queries.md](references/query-plan/catalog-queries.md), [query-plan/guc-experiments.md](references/query-plan/guc-experiments.md), [query-plan/report-format.md](references/query-plan/report-format.md) @@ -278,6 +342,54 @@ PGPASSWORD="$TOKEN" psql "host=$HOST port=5432 user=admin dbname=postgres sslmod **Safety.** Plan capture uses `readonly_query` exclusively — it rejects INSERT/UPDATE/DELETE/DDL at the MCP layer. Rewrite DML to SELECT (Phase 1) rather than asking `transact --allow-writes` to run it; write-mode `transact` bypasses all MCP safety checks. **MUST NOT** run arbitrary DDL/DML or pl/pgsql. +### Workflow 9: Full PostgreSQL → DSQL Schema Migration + +End-to-end conversion of a PostgreSQL schema to DSQL-compatible DDL with companion code generation. Complements `dsql_lint` by handling semantic conversions the linter cannot automate. + +**Phase 0 — Load reference material.** Load [pg-migrations/type-mapping.md](references/pg-migrations/type-mapping.md) and [pg-migrations/schema-objects.md](references/pg-migrations/schema-objects.md) before starting. + +**Phase 1 — Lint first.** Run `dsql_lint(sql=source_sql, fix=true)` per Workflow 7. This handles SERIAL, JSON, FK removal, index ASYNC, transaction splitting mechanically. + +**Phase 2 — Resolve unfixable diagnostics.** For each `unfixable` diagnostic: +- `index_using` → Load [pg-migrations/index-conversion.md](references/pg-migrations/index-conversion.md), convert GIN/GiST/BRIN to btree +- `index_partial` → Remove WHERE clause or add filter column to composite index +- `index_expression` → Add GENERATED ALWAYS AS STORED column + btree index +- `create_table_as` → CREATE TABLE with explicit columns + INSERT...SELECT +- `unsupported_alter_table_op` → Table Recreation Pattern per Workflow 6 + +**Phase 3 — Semantic conversions (beyond dsql_lint).** Apply these using skill knowledge: +- ENUM types → CHECK constraints (load [pg-migrations/schema-objects.md](references/pg-migrations/schema-objects.md)) +- PL/pgSQL functions/triggers → SQL functions (load [pg-migrations/plpgsql-patterns.md](references/pg-migrations/plpgsql-patterns.md)) +- Add COLLATE "C" to all string columns +- uuid_generate_v4() → gen_random_uuid() +- lastval() → currval('explicit_name') +- Materialized views → regular views +- Extensions → alternatives + +**Phase 4 — Generate companion code:** +- FK validation functions (load [pg-migrations/fk-replacement.md](references/pg-migrations/fk-replacement.md)) +- Cascade functions for ON DELETE CASCADE/SET NULL +- OCC retry wrapper (load [pg-migrations/occ-retry-patterns.md](references/pg-migrations/occ-retry-patterns.md)) + +**Phase 5 — Re-lint and deploy.** Run `dsql_lint(fix=true)` on the final converted SQL to verify. Deploy each DDL via `transact` (one per call). + +### Workflow 10: ORM Migration (Django/Hibernate/Rails) + +Framework-specific migration guidance. Load the appropriate guide: +- Django → [orm-guides/django.md](references/orm-guides/django.md) +- Hibernate/Spring Boot → [orm-guides/hibernate.md](references/orm-guides/hibernate.md) +- Rails → [orm-guides/rails.md](references/orm-guides/rails.md) + +Key steps common to all ORMs: +1. Install DSQL adapter/dialect +2. Configure IAM token authentication (no passwords) +3. Replace ForeignKey/ManyToOne with plain ID fields + application validation +4. Use UUID primary keys +5. Add OCC retry logic (SQLSTATE 40001) +6. Split migrations to one DDL per transaction +7. Use ASYNC indexes (raw SQL in migrations) +8. Set connection pool max lifetime below 1 hour + --- ## Error Scenarios diff --git a/plugins/databases-on-aws/skills/dsql/references/orm-guides/django.md b/plugins/databases-on-aws/skills/dsql/references/orm-guides/django.md new file mode 100644 index 00000000..f25f1f40 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/orm-guides/django.md @@ -0,0 +1,171 @@ +# Django ORM Migration Guide for DSQL + +How to run Django applications against Aurora DSQL. + +Sources: +- [Aurora DSQL Django Adapter](https://github.com/awslabs/aurora-dsql-orms/tree/main/python/django) +- [aurora-dsql-django on PyPI](https://pypi.org/project/aurora-dsql-django/) +- [Django Pet Clinic Example](https://github.com/awslabs/aurora-dsql-orms/tree/main/python/django/examples/pet-clinic-app) + +--- + +## 1. Installation + +```bash +pip install aurora-dsql-django boto3 +``` + +## 2. Database Configuration + +```python +# settings.py +DATABASES = { + 'default': { + 'ENGINE': 'aurora_dsql_django', # NOT 'django.db.backends.postgresql' + 'NAME': 'postgres', # Always 'postgres' for DSQL + 'HOST': '..dsql.amazonaws.com', + 'PORT': '5432', + 'OPTIONS': { + 'sslmode': 'require', + }, + 'CONN_MAX_AGE': 1800, # 30 min (below DSQL's 1-hour timeout) + } +} +``` + +**Key differences:** +- Engine is `aurora_dsql_django` (handles IAM token generation automatically) +- No `USER` or `PASSWORD` — IAM token via boto3 +- Database name is always `postgres` +- SSL required + +## 3. Model Changes + +### Replace ForeignKey with Plain Fields + +```python +# BAD: Django creates FK constraint (DSQL rejects) +class Ticket(models.Model): + org = models.ForeignKey(Organization, on_delete=models.CASCADE) + +# GOOD: Plain field + application-layer validation +class Ticket(models.Model): + org_id = models.BigIntegerField(db_index=True) + reporter_id = models.UUIDField(db_index=True) + + def clean(self): + if not Organization.objects.filter(id=self.org_id).exists(): + raise ValidationError({'org_id': 'Organization does not exist'}) + if not User.objects.filter(id=self.reporter_id).exists(): + raise ValidationError({'reporter_id': 'User does not exist'}) +``` + +### Use UUID Primary Keys + +```python +import uuid +from django.db import models + +class BaseModel(models.Model): + id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False) + class Meta: + abstract = True + +class Organization(BaseModel): + name = models.CharField(max_length=200, unique=True) + settings = models.JSONField(default=dict) # Stored as json in DSQL +``` + +### Field Mapping + +| Django Field | DSQL Behavior | Alternative | +|---|---|---| +| `ForeignKey` | FK constraint fails | `BigIntegerField` / `UUIDField` | +| `ArrayField` | Not a stored type | `JSONField` with list | +| `HStoreField` | Not supported | `JSONField` | +| `SearchVectorField` | No FTS | External search (OpenSearch) | +| `CITextField` | No citext extension | `CharField` + `lower()` queries | + +## 4. Migrations (One DDL Per Transaction) + +```python +# Split complex migrations into separate files + +# 0001_create_users.py +class Migration(migrations.Migration): + operations = [ + migrations.CreateModel(name='User', fields=[ + ('id', models.UUIDField(primary_key=True, default=uuid.uuid4)), + ('email', models.CharField(max_length=255)), + ]), + ] + +# 0002_add_users_email_index.py (SEPARATE migration) +class Migration(migrations.Migration): + operations = [ + migrations.RunSQL("CREATE UNIQUE INDEX ASYNC idx_users_email ON myapp_user (email)"), + ] +``` + +## 5. OCC Retry Decorator + +```python +import time, random +from django.db import OperationalError, transaction + +def with_occ_retry(max_retries=5): + def decorator(func): + def wrapper(*args, **kwargs): + for attempt in range(max_retries): + try: + with transaction.atomic(): + return func(*args, **kwargs) + except OperationalError as e: + if hasattr(e, '__cause__') and hasattr(e.__cause__, 'pgcode'): + if e.__cause__.pgcode == '40001' and attempt < max_retries - 1: + delay = min(0.05 * (2 ** attempt) + random.uniform(0, 0.05), 5.0) + time.sleep(delay) + continue + raise + return wrapper + return decorator + +# Usage: +@with_occ_retry() +def create_ticket(org_id, reporter_id, title): + ticket = Ticket(org_id=org_id, reporter_id=reporter_id, title=title) + ticket.full_clean() + ticket.save() + return ticket +``` + +## 6. Collation (ORDER BY) + +```python +# C collation: uppercase sorts before lowercase +# For case-insensitive ordering: +from django.db.models.functions import Lower +Organization.objects.order_by(Lower('name')) +``` + +## 7. Settings to Remove + +```python +# Remove or avoid: +# - django.contrib.postgres (ArrayField, HStoreField) +# - CONN_MAX_AGE > 3600 (DSQL timeout is 1 hour) +``` + +## 8. Checklist + +- [ ] Install `aurora-dsql-django` and `boto3` +- [ ] Change ENGINE to `aurora_dsql_django` +- [ ] Remove USER/PASSWORD from database config +- [ ] Replace all `ForeignKey` with plain ID fields +- [ ] Add `clean()` or signal-based FK validation +- [ ] Use `UUIDField` for primary keys +- [ ] Add OCC retry decorator +- [ ] Set `CONN_MAX_AGE` ≤ 1800 +- [ ] Split migrations to one DDL per file +- [ ] Use `RunSQL("CREATE INDEX ASYNC ...")` for indexes +- [ ] Test ORDER BY with C collation diff --git a/plugins/databases-on-aws/skills/dsql/references/orm-guides/hibernate.md b/plugins/databases-on-aws/skills/dsql/references/orm-guides/hibernate.md new file mode 100644 index 00000000..b24a337d --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/orm-guides/hibernate.md @@ -0,0 +1,146 @@ +# Hibernate / Spring Boot Migration Guide for DSQL + +How to run Java applications with Hibernate and Spring Boot against Aurora DSQL. + +Sources: +- [Aurora DSQL Hibernate Adapter](https://github.com/awslabs/aurora-dsql-orms/tree/main/java/hibernate) +- [JDBC + HikariCP Sample](https://github.com/aws-samples/aurora-dsql-samples/tree/main/java/pgjdbc) +- [Spring Boot Sample](https://github.com/aws-samples/aurora-dsql-samples/tree/main/java/spring_boot) +- [Liquibase Sample](https://github.com/aws-samples/aurora-dsql-samples/tree/main/java/liquibase) + +--- + +## 1. Dependencies + +```xml + + software.amazon.dsql + aurora-dsql-hibernate + LATEST + + + software.amazon.awssdk + dsql + +``` + +## 2. Configuration + +```yaml +# application.yml +spring: + datasource: + url: jdbc:postgresql://..dsql.amazonaws.com:5432/postgres?sslmode=require + hikari: + maximum-pool-size: 10 + max-lifetime: 1800000 # 30 min (below DSQL 1-hour timeout) + jpa: + database-platform: software.amazon.dsql.hibernate.AuroraDsqlDialect + hibernate: + ddl-auto: none # NEVER use auto DDL with DSQL + properties: + hibernate.jdbc.batch_size: 100 +``` + +**Critical:** `ddl-auto` MUST be `none`. Hibernate's auto-DDL batches multiple statements. + +## 3. Entity Changes + +### Remove @ManyToOne / @OneToMany + +```java +// BAD: Hibernate creates FK constraint +@Entity +public class Ticket { + @ManyToOne + @JoinColumn(name = "org_id") + private Organization org; +} + +// GOOD: Plain column + service-layer validation +@Entity +@Table(name = "tickets") +public class Ticket { + @Id + @GeneratedValue(strategy = GenerationType.UUID) + private UUID id; + + @Column(name = "org_id", nullable = false) + private Long orgId; + + @Column(name = "reporter_id", nullable = false) + private UUID reporterId; + + @Column(name = "metadata", columnDefinition = "json") + private String metadata; +} +``` + +### Sequence Configuration + +```java +@Id +@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "order_seq") +@SequenceGenerator(name = "order_seq", sequenceName = "order_seq", allocationSize = 1) +private Long id; +// allocationSize MUST match DSQL CACHE value (1 or 65536) +``` + +## 4. OCC Retry with Spring Retry + +```java +@Retryable( + retryFor = {LockAcquisitionException.class}, + maxAttempts = 5, + backoff = @Backoff(delay = 50, multiplier = 2, maxDelay = 5000) +) +@Transactional +public Order createOrder(UUID customerId, BigDecimal total) { + if (!customerRepository.existsById(customerId)) { + throw new EntityNotFoundException("Customer not found"); + } + return orderRepository.save(new Order(customerId, total)); +} +``` + +## 5. Liquibase (One DDL Per Changeset) + +```xml + + + + + + + + + + CREATE INDEX ASYNC idx_orders_customer ON orders (customer_id) + +``` + +## 6. Batch Operations (Under 3,000 Rows) + +```java +@Transactional +public void bulkInsert(List orders) { + List> batches = Lists.partition(orders, 500); + for (List batch : batches) { + orderRepository.saveAll(batch); + entityManager.flush(); + entityManager.clear(); + } +} +``` + +## 7. Checklist + +- [ ] Add `aurora-dsql-hibernate` dialect dependency +- [ ] Set `hibernate.ddl-auto = none` +- [ ] Replace `@ManyToOne` / `@OneToMany` with plain columns +- [ ] Add service-layer FK validation +- [ ] Set sequence `allocationSize` to match DSQL CACHE +- [ ] Add Spring Retry for OCC (SQLSTATE 40001) +- [ ] Set HikariCP `maxLifetime` ≤ 1800000ms +- [ ] Batch writes to ≤500 rows per transaction +- [ ] Use Liquibase with one DDL per changeset diff --git a/plugins/databases-on-aws/skills/dsql/references/orm-guides/rails.md b/plugins/databases-on-aws/skills/dsql/references/orm-guides/rails.md new file mode 100644 index 00000000..b214b4b0 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/orm-guides/rails.md @@ -0,0 +1,197 @@ +# Ruby on Rails Migration Guide for DSQL + +How to run Rails applications against Aurora DSQL. + +Sources: +- [Rails Sample](https://github.com/aws-samples/aurora-dsql-samples/tree/main/ruby/rails) +- [Ruby pg Driver Sample](https://github.com/aws-samples/aurora-dsql-samples/tree/main/ruby/ruby-pg) +- [Rails with IAM Auth](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/SECTION_program-with-ruby-rails.html) + +--- + +## 1. Dependencies + +```ruby +# Gemfile +gem 'pg' +gem 'aws-sdk-dsql' +``` + +## 2. Database Configuration + +```yaml +# config/database.yml +default: &default + adapter: postgresql + encoding: unicode + pool: 10 + host: <%= ENV['DSQL_ENDPOINT'] %> + port: 5432 + database: postgres + sslmode: require +``` + +### IAM Token Initializer + +```ruby +# config/initializers/dsql_auth.rb +require 'aws-sdk-dsql' + +ActiveSupport.on_load(:active_record) do + ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.class_eval do + private + alias_method :original_connect, :connect + def connect + client = Aws::DSQL::Client.new(region: ENV['AWS_REGION'] || 'us-east-1') + @connection_parameters[:password] = client.generate_db_connect_admin_auth_token( + hostname: ENV['DSQL_ENDPOINT'] + ) + @connection_parameters[:user] = 'admin' + original_connect + end + end +end +``` + +## 3. Model Changes + +### UUID Primary Keys + +```ruby +# config/initializers/generators.rb +Rails.application.config.generators do |g| + g.orm :active_record, primary_key_type: :uuid +end +``` + +### Associations Without FK Constraints + +```ruby +class Ticket < ApplicationRecord + belongs_to :organization, class_name: 'Organization', foreign_key: 'org_id', optional: false + # Rails belongs_to works without DB FK — just does SELECT to load + + validate :validate_foreign_keys + + private + def validate_foreign_keys + errors.add(:org_id, 'does not exist') unless Organization.exists?(org_id) + end +end +``` + +### ENUM → String + Validation + +```ruby +class Ticket < ApplicationRecord + STATUSES = %w[open in_progress resolved closed].freeze + validates :status, inclusion: { in: STATUSES } +end +``` + +## 4. Migrations (One DDL Per File) + +```ruby +# BAD: Multiple DDL in one migration +class CreateUsers < ActiveRecord::Migration[7.1] + def change + create_table :users, id: :uuid do |t| + t.string :email, null: false + end + add_index :users, :email, unique: true # Second DDL — fails + end +end + +# GOOD: Separate migrations +class CreateUsers < ActiveRecord::Migration[7.1] + def change + create_table :users, id: :uuid do |t| + t.string :email, null: false + end + end +end + +class AddUsersEmailIndex < ActiveRecord::Migration[7.1] + def up + execute "CREATE UNIQUE INDEX ASYNC idx_users_email ON users (email)" + end + def down + execute "DROP INDEX IF EXISTS idx_users_email" + end +end +``` + +**Remove all `foreign_key: true` from migrations.** + +## 5. OCC Retry Concern + +```ruby +# app/models/concerns/occ_retryable.rb +module OccRetryable + extend ActiveSupport::Concern + + class_methods do + def with_occ_retry(max_retries: 5, &block) + attempt = 0 + begin + ActiveRecord::Base.transaction(&block) + rescue ActiveRecord::SerializationFailure => e + attempt += 1 + if attempt < max_retries + sleep([0.05 * (2 ** attempt) + rand(0.0..0.05), 5.0].min) + retry + else + raise + end + end + end + end +end + +# Usage: +class TicketService + include OccRetryable + def self.create_ticket(params) + with_occ_retry { Ticket.create!(params) } + end +end +``` + +## 6. Things to Avoid + +| Rails Feature | Issue | Alternative | +|---|---|---| +| `foreign_key: true` | Not supported | Model validations | +| `add_foreign_key` | Not supported | Skip | +| `dependent: :destroy` (with FK) | No DB cascade | `before_destroy` callback | +| `add_index` (standard) | Needs ASYNC | `execute "CREATE INDEX ASYNC..."` | +| `change_column` | ALTER TYPE not supported | Recreate table | +| `remove_column` | DROP COLUMN not supported | Recreate table | + +## 7. Cascade Deletes (Without FK) + +```ruby +class Organization < ApplicationRecord + has_many :tickets, foreign_key: 'org_id' + + before_destroy :cascade_cleanup + private + def cascade_cleanup + Ticket.where(org_id: id).update_all(status: 'cancelled') + end +end +``` + +## 8. Checklist + +- [ ] Add `aws-sdk-dsql` gem +- [ ] Configure IAM token initializer +- [ ] Set `database: postgres`, `sslmode: require` +- [ ] Use `id: :uuid` for all tables +- [ ] Remove all `foreign_key: true` from migrations +- [ ] Add model-level FK validation +- [ ] Split migrations to one DDL per file +- [ ] Use `execute "CREATE INDEX ASYNC..."` for indexes +- [ ] Add OCC retry concern +- [ ] Set `config.active_record.schema_format = :sql` +- [ ] Test `dependent: :destroy` via callbacks diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/data-migration.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/data-migration.md new file mode 100644 index 00000000..711bacf9 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/data-migration.md @@ -0,0 +1,293 @@ +# Data Migration to DSQL + +Operational guidance for migrating data from PostgreSQL to Aurora DSQL. Covers migration +order, batch INSERT patterns (COPY replacement), validation, and rollback planning. + +Sources: +- [Migration Guide](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-migration-guide.html) +- [Quotas and Limits](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/CHAP_quotas.html) + +--- + +## Migration Order + +Deploy in this order — each step in its own transaction: + +``` +1. Sequences (CREATE SEQUENCE ... CACHE 1) +2. Tables (CREATE TABLE — no FKs, with CHECK constraints) +3. Indexes (CREATE INDEX ASYNC — non-blocking) +4. Functions (CREATE FUNCTION — SQL language only) +5. Views (CREATE VIEW / CREATE OR REPLACE VIEW) +6. Roles & Grants (CREATE ROLE, GRANT) +7. Data (batched INSERTs — after schema is complete) +8. Sequence alignment (setval to max ID + 1) +9. Validation (row counts, spot checks, index status) +``` + +**Critical:** One DDL per transaction. Do NOT batch multiple DDL statements. + +--- + +## COPY Replacement: Batched INSERT + +DSQL does not support the `COPY` command. Use batched INSERT statements instead. + +### Export from PostgreSQL + +```bash +# Option 1: Export as INSERT statements (small tables) +pg_dump --data-only --inserts --table=users source_db > users_data.sql + +# Option 2: Export as CSV (large tables — process with script) +psql source_db -c "COPY users TO STDOUT WITH (FORMAT csv, HEADER)" > users.csv + +# Option 3: Export as TSV (simpler parsing) +psql source_db -c "COPY users TO STDOUT WITH (FORMAT text)" > users.tsv +``` + +### Batch INSERT Pattern + +```sql +-- Batch size: 500-1000 rows per transaction (well under 3,000 limit) +-- Leaves headroom for OCC retries and row-size variation + +BEGIN; +INSERT INTO users (id, email, name, created_at) VALUES + ('a1b2c3d4-...', 'alice@example.com', 'Alice', '2024-01-01 00:00:00+00'), + ('e5f6g7h8-...', 'bob@example.com', 'Bob', '2024-01-02 00:00:00+00'), + -- ... up to 500-1000 rows + ; +COMMIT; +-- Repeat for next batch +``` + +### Python Batch Loader + +```python +import csv +import psycopg2 +import time +import random + +BATCH_SIZE = 500 +MAX_RETRIES = 5 + +def load_csv_to_dsql(dsql_conn_params, csv_path, table_name, columns): + """Load CSV data into DSQL in batches with OCC retry.""" + with open(csv_path, 'r') as f: + reader = csv.DictReader(f) + batch = [] + total_loaded = 0 + + for row in reader: + batch.append(tuple(row[col] for col in columns)) + + if len(batch) >= BATCH_SIZE: + insert_batch(dsql_conn_params, table_name, columns, batch) + total_loaded += len(batch) + print(f"Loaded {total_loaded} rows...") + batch = [] + + # Final partial batch + if batch: + insert_batch(dsql_conn_params, table_name, columns, batch) + total_loaded += len(batch) + + print(f"Complete: {total_loaded} rows loaded into {table_name}") + +def insert_batch(conn_params, table_name, columns, rows): + """Insert a batch with OCC retry.""" + placeholders = ','.join(['%s'] * len(columns)) + col_list = ','.join(columns) + sql = f"INSERT INTO {table_name} ({col_list}) VALUES ({placeholders})" + + for attempt in range(MAX_RETRIES): + conn = psycopg2.connect(**conn_params) + try: + with conn.cursor() as cur: + for row in rows: + cur.execute(sql, row) + conn.commit() + return + except psycopg2.errors.SerializationFailure: + conn.rollback() + if attempt < MAX_RETRIES - 1: + delay = min(0.05 * (2 ** attempt) + random.uniform(0, 0.05), 5.0) + time.sleep(delay) + else: + raise + finally: + conn.close() +``` + +### Node.js Batch Loader + +```javascript +const { Pool } = require('pg'); +const fs = require('fs'); +const csv = require('csv-parse/sync'); + +const BATCH_SIZE = 500; + +async function loadCsvToDsql(pool, csvPath, tableName, columns) { + const content = fs.readFileSync(csvPath, 'utf-8'); + const records = csv.parse(content, { columns: true }); + + for (let i = 0; i < records.length; i += BATCH_SIZE) { + const batch = records.slice(i, i + BATCH_SIZE); + await insertBatchWithRetry(pool, tableName, columns, batch); + console.log(`Loaded ${Math.min(i + BATCH_SIZE, records.length)} / ${records.length}`); + } +} + +async function insertBatchWithRetry(pool, tableName, columns, rows, maxRetries = 5) { + const colList = columns.join(', '); + const placeholders = columns.map((_, i) => `$${i + 1}`).join(', '); + const sql = `INSERT INTO ${tableName} (${colList}) VALUES (${placeholders})`; + + for (let attempt = 0; attempt < maxRetries; attempt++) { + const client = await pool.connect(); + try { + await client.query('BEGIN'); + for (const row of rows) { + await client.query(sql, columns.map(col => row[col])); + } + await client.query('COMMIT'); + return; + } catch (err) { + await client.query('ROLLBACK'); + if (err.code === '40001' && attempt < maxRetries - 1) { + await new Promise(r => setTimeout(r, Math.min(50 * 2 ** attempt, 5000))); + } else { + throw err; + } + } finally { + client.release(); + } + } +} +``` + +--- + +## Sequence Alignment + +After data migration, align sequences to be ahead of max IDs: + +```sql +-- For each sequence, set to max existing value + 1 +SELECT setval('users_id_seq', (SELECT COALESCE(MAX(id), 0) + 1 FROM users)); +SELECT setval('orders_id_seq', (SELECT COALESCE(MAX(id), 0) + 1 FROM orders)); + +-- Verify sequences are ahead +SELECT 'users_id_seq' AS seq, last_value FROM users_id_seq +UNION ALL +SELECT 'orders_id_seq', last_value FROM orders_id_seq; +``` + +--- + +## Data Validation Queries + +Run after migration to verify correctness: + +```sql +-- 1. Compare row counts (run on both source PG and target DSQL) +SELECT 'users' AS table_name, COUNT(*) AS row_count FROM users +UNION ALL SELECT 'orders', COUNT(*) FROM orders +UNION ALL SELECT 'tickets', COUNT(*) FROM tickets; + +-- 2. Verify no NULL in NOT NULL columns +SELECT COUNT(*) AS null_emails FROM users WHERE email IS NULL; -- should be 0 + +-- 3. Check unique constraints hold +SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1; -- should be empty + +-- 4. Verify async indexes are ready +SELECT indexrelid::regclass AS index_name, indisvalid AS is_ready +FROM pg_index WHERE NOT indisvalid; +-- Should return 0 rows when all indexes are built + +-- 5. Spot-check random rows +SELECT * FROM users ORDER BY random() LIMIT 10; + +-- 6. Verify sequences are ahead of max IDs +SELECT last_value FROM users_id_seq; +SELECT MAX(id) FROM users; -- last_value should be > MAX(id) +``` + +--- + +## Multi-Schema Migration + +### ≤10 Schemas: Direct + +```sql +-- Deploy schemas first (one per transaction) +CREATE SCHEMA billing; +CREATE SCHEMA support; + +-- Grant access (one per transaction) +GRANT USAGE ON SCHEMA billing TO app_role; +GRANT USAGE ON SCHEMA support TO app_role; + +-- Create tables in schemas +CREATE TABLE billing.invoices (...); +CREATE TABLE support.tickets (...); +``` + +### >10 Schemas: Consolidate + +1. Pick the 9 most important schemas to keep (+ public = 10) +2. Merge overflow schemas into existing ones with table name prefixes +3. Update all application references + +```sql +-- Schema 'analytics' (overflow) → merge into 'public' with prefix +-- Before: analytics.reports, analytics.dashboards +-- After: public.analytics_reports, public.analytics_dashboards +``` + +--- + +## Pre-Flight Checklist + +### Before Migration + +- [ ] Schema converted and validated with `dsql_lint` +- [ ] All DDL deploys successfully (one per transaction) +- [ ] Async indexes show `indisvalid = true` +- [ ] Sequences created with correct CACHE values +- [ ] SQL functions deploy without errors +- [ ] Application code updated (FK validation, OCC retry, trigger replacements) + +### After Data Load + +- [ ] Row counts match source database (per table) +- [ ] Spot-check 10 random rows per table +- [ ] NULL counts match for nullable columns +- [ ] Unique constraints hold (no violations) +- [ ] Sequences aligned to max ID + 1 +- [ ] All CRUD operations succeed with retry logic +- [ ] ORDER BY results acceptable with C collation + +### Rollback Plan + +- [ ] Source PostgreSQL database remains available during cutover +- [ ] Connection string switch is reversible (feature flag or config) +- [ ] Data written to DSQL during validation can be discarded +- [ ] Team knows how to switch back within SLA window + +--- + +## Performance Tips + +| Tip | Impact | +|---|---| +| Use 500-1000 rows per batch (not 3,000) | Leaves headroom for OCC retries | +| Use CACHE 65536 sequences during bulk load | Reduces sequence round-trips | +| Load tables without indexes first, add indexes after | Faster bulk insert | +| Parallelize batches across tables | Reduces total migration time | +| Use ON CONFLICT DO NOTHING for idempotent re-runs | Safe to restart failed migration | +| Disable application FK validation during bulk load | Faster (validate after) | diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/fk-replacement.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/fk-replacement.md new file mode 100644 index 00000000..17f7b3b5 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/fk-replacement.md @@ -0,0 +1,295 @@ +# Foreign Key → Validation Function Replacement + +DSQL does not support foreign key constraints. `dsql_lint` removes FK declarations but does +NOT generate replacement code. This file provides templates for generating application-layer +referential integrity enforcement. + +Sources: +- [Migration Guide](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-migration-guide.html) +- [Considerations](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/considerations.html) + +--- + +## Generated Function Templates + +### Basic FK Validation (EXISTS check) + +For each removed FK constraint, generate a validation function: + +```sql +-- Template: validate_fk_{child_table}_{fk_column} +-- Generated from: FOREIGN KEY (fk_column) REFERENCES parent_table(parent_column) + +CREATE FUNCTION validate_fk_{child_table}_{fk_column}(p_value {fk_type}) RETURNS boolean +LANGUAGE sql AS $$ + SELECT EXISTS (SELECT 1 FROM {parent_table} WHERE {parent_column} = p_value); +$$; +``` + +**Example:** +```sql +-- Original FK: tickets.org_id REFERENCES organizations(id) +CREATE FUNCTION validate_fk_tickets_org_id(p_value bigint) RETURNS boolean +LANGUAGE sql AS $$ + SELECT EXISTS (SELECT 1 FROM organizations WHERE id = p_value); +$$; + +-- Original FK: tickets.reporter_id REFERENCES users(id) +CREATE FUNCTION validate_fk_tickets_reporter_id(p_value uuid) RETURNS boolean +LANGUAGE sql AS $$ + SELECT EXISTS (SELECT 1 FROM users WHERE id = p_value); +$$; +``` + +### Tenant-Scoped FK Validation + +For multi-tenant schemas, FK validation MUST be scoped to the same tenant: + +```sql +-- Template: validate_fk_{child_table}_{fk_column}_tenant +CREATE FUNCTION validate_fk_{child_table}_{fk_column}( + p_tenant_id uuid, + p_value {fk_type} +) RETURNS boolean +LANGUAGE sql AS $$ + SELECT EXISTS ( + SELECT 1 FROM {parent_table} + WHERE {parent_column} = p_value AND tenant_id = p_tenant_id + ); +$$; +``` + +**Example:** +```sql +-- Tenant-scoped: orders.customer_id REFERENCES customers(id) within same tenant +CREATE FUNCTION validate_fk_orders_customer_id( + p_tenant_id uuid, + p_customer_id uuid +) RETURNS boolean +LANGUAGE sql AS $$ + SELECT EXISTS ( + SELECT 1 FROM customers WHERE id = p_customer_id AND tenant_id = p_tenant_id + ); +$$; +``` + +--- + +## Cascade Function Templates + +### ON DELETE CASCADE Replacement + +```sql +-- Template: cascade_delete_{parent_table} +-- Generated from: FOREIGN KEY ... ON DELETE CASCADE +CREATE FUNCTION cascade_delete_{parent_table}(p_parent_id {pk_type}) RETURNS void +LANGUAGE sql AS $$ + DELETE FROM {child_table_1} WHERE {fk_column_1} = p_parent_id; + DELETE FROM {child_table_2} WHERE {fk_column_2} = p_parent_id; +$$; +``` + +**Example:** +```sql +-- Original: orders.user_id REFERENCES users(id) ON DELETE CASCADE +-- sessions.user_id REFERENCES users(id) ON DELETE CASCADE +CREATE FUNCTION cascade_delete_users(p_user_id bigint) RETURNS void +LANGUAGE sql AS $$ + DELETE FROM orders WHERE user_id = p_user_id; + DELETE FROM sessions WHERE user_id = p_user_id; +$$; +``` + +### ON DELETE SET NULL Replacement + +```sql +-- Template: cascade_set_null_{parent_table} +CREATE FUNCTION cascade_set_null_{parent_table}(p_parent_id {pk_type}) RETURNS void +LANGUAGE sql AS $$ + UPDATE {child_table} SET {fk_column} = NULL WHERE {fk_column} = p_parent_id; +$$; +``` + +**Example:** +```sql +-- Original: tickets.assignee_id REFERENCES users(id) ON DELETE SET NULL +CREATE FUNCTION cascade_set_null_users_assignee(p_user_id uuid) RETURNS void +LANGUAGE sql AS $$ + UPDATE tickets SET assignee_id = NULL WHERE assignee_id = p_user_id; +$$; +``` + +### ON UPDATE CASCADE Replacement + +```sql +-- Template: cascade_update_{parent_table}_{column} +CREATE FUNCTION cascade_update_{parent_table}_{column}( + p_old_value {pk_type}, + p_new_value {pk_type} +) RETURNS void +LANGUAGE sql AS $$ + UPDATE {child_table} SET {fk_column} = p_new_value WHERE {fk_column} = p_old_value; +$$; +``` + +--- + +## Application Integration Patterns + +### Pattern A: Service Layer (Recommended) + +```python +# Python — validate before INSERT +def create_ticket(tenant_id, org_id, reporter_id, title): + with db.cursor() as cur: + # Validate FKs + cur.execute("SELECT validate_fk_tickets_org_id(%s, %s)", [tenant_id, org_id]) + if not cur.fetchone()[0]: + raise ValueError(f"Organization {org_id} does not exist for tenant") + + cur.execute("SELECT validate_fk_tickets_reporter_id(%s, %s)", [tenant_id, reporter_id]) + if not cur.fetchone()[0]: + raise ValueError(f"Reporter {reporter_id} does not exist for tenant") + + # Insert + cur.execute( + "INSERT INTO tickets (tenant_id, org_id, reporter_id, title) VALUES (%s,%s,%s,%s)", + [tenant_id, org_id, reporter_id, title] + ) + db.commit() +``` + +### Pattern B: Database Function Wrapper + +Wrap INSERT with validation in a single SQL function: + +```sql +CREATE FUNCTION insert_ticket( + p_tenant_id uuid, + p_org_id bigint, + p_reporter_id uuid, + p_title text +) RETURNS uuid +LANGUAGE sql AS $$ + -- Validation + insert in one call + -- Returns NULL if FK validation fails, otherwise returns new ticket ID + SELECT CASE + WHEN NOT validate_fk_tickets_org_id(p_tenant_id, p_org_id) THEN NULL + WHEN NOT validate_fk_tickets_reporter_id(p_tenant_id, p_reporter_id) THEN NULL + ELSE ( + INSERT INTO tickets (id, tenant_id, org_id, reporter_id, title) + VALUES (gen_random_uuid(), p_tenant_id, p_org_id, p_reporter_id, p_title) + RETURNING id + ) + END; +$$; +``` + +### Pattern C: ORM Hooks + +**Django:** +```python +from django.db.models.signals import pre_save +from django.dispatch import receiver + +@receiver(pre_save, sender=Ticket) +def validate_ticket_fks(sender, instance, **kwargs): + if not Organization.objects.filter(id=instance.org_id, tenant_id=instance.tenant_id).exists(): + raise ValidationError({'org_id': 'Organization does not exist'}) +``` + +**SQLAlchemy:** +```python +from sqlalchemy import event + +@event.listens_for(Ticket, 'before_insert') +def validate_ticket_fks(mapper, connection, target): + result = connection.execute( + text("SELECT validate_fk_tickets_org_id(:tid, :oid)"), + {"tid": target.tenant_id, "oid": target.org_id} + ) + if not result.scalar(): + raise IntegrityError("FK violation: org_id does not exist") +``` + +**Spring/Hibernate:** +```java +@Service +public class TicketService { + @Transactional + public Ticket createTicket(UUID tenantId, Long orgId, UUID reporterId, String title) { + if (!organizationRepository.existsByIdAndTenantId(orgId, tenantId)) { + throw new EntityNotFoundException("Organization not found"); + } + if (!userRepository.existsByIdAndTenantId(reporterId, tenantId)) { + throw new EntityNotFoundException("Reporter not found"); + } + return ticketRepository.save(new Ticket(tenantId, orgId, reporterId, title)); + } +} +``` + +### Pattern D: Cascade on DELETE + +```python +def delete_organization(tenant_id, org_id): + with db.cursor() as cur: + # Check for dependents first (optional — or just cascade) + cur.execute( + "SELECT COUNT(*) FROM tickets WHERE tenant_id = %s AND org_id = %s AND NOT resolved", + [tenant_id, org_id] + ) + active_tickets = cur.fetchone()[0] + if active_tickets > 0: + raise ValueError(f"Cannot delete: {active_tickets} active tickets exist") + + # Cascade + cur.execute("SELECT cascade_delete_organizations(%s)", [org_id]) + # Delete parent + cur.execute("DELETE FROM organizations WHERE id = %s AND tenant_id = %s", [org_id, tenant_id]) + db.commit() +``` + +--- + +## Calling Point Reference + +| Original FK Action | When to Call Replacement | Where | +|---|---|---| +| REFERENCES (basic) | Before INSERT/UPDATE of child | Service layer or DB function | +| ON DELETE CASCADE | Before DELETE of parent | Service layer | +| ON DELETE SET NULL | Before DELETE of parent | Service layer | +| ON UPDATE CASCADE | After UPDATE of parent PK | Service layer (rare) | +| ON DELETE RESTRICT | Before DELETE of parent (check dependents) | Service layer | + +--- + +## Generation Workflow + +Given a PostgreSQL schema with FKs: + +1. **Extract all FK constraints:** + ```sql + SELECT + tc.table_name AS child_table, + kcu.column_name AS fk_column, + ccu.table_name AS parent_table, + ccu.column_name AS parent_column, + rc.delete_rule, + rc.update_rule + FROM information_schema.table_constraints tc + JOIN information_schema.key_column_usage kcu ON tc.constraint_name = kcu.constraint_name + JOIN information_schema.constraint_column_usage ccu ON tc.constraint_name = ccu.constraint_name + JOIN information_schema.referential_constraints rc ON tc.constraint_name = rc.constraint_name + WHERE tc.constraint_type = 'FOREIGN KEY'; + ``` + +2. **For each FK, generate:** + - A `validate_fk_*()` function (always) + - A `cascade_*()` function (if ON DELETE CASCADE/SET NULL) + +3. **Run `dsql_lint`** on the generated functions to verify compatibility + +4. **Deploy** each function via `transact(["CREATE FUNCTION ..."])` — one per call + +5. **Update application code** to call validation before INSERT/UPDATE and cascade before DELETE diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/function-compatibility.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/function-compatibility.md new file mode 100644 index 00000000..8d84ee75 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/function-compatibility.md @@ -0,0 +1,242 @@ +# PostgreSQL Function Compatibility in DSQL + +Reference for which built-in PostgreSQL functions work in Aurora DSQL, which need +replacements, and what alternatives exist. `dsql_lint` does not check function usage — +use this reference when migrating application code and stored functions. + +Sources: +- [Supported SQL Features](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-supported-sql-features.html) +- [Supported Data Types — JSON Functions](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-supported-data-types.html) +- [Migration Guide](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-migration-guide.html) + +--- + +## Common Function Replacements + +These are the most frequently encountered replacements during migration: + +| PostgreSQL Function | DSQL Replacement | Notes | +|---|---|---| +| `uuid_generate_v4()` | `gen_random_uuid()` | Built-in, no extension needed | +| `lastval()` | `currval('sequence_name')` | Must use explicit sequence name | +| `pg_notify(channel, payload)` | SNS/SQS/EventBridge | Application-layer messaging | +| `pg_advisory_lock(id)` | DynamoDB conditional write | Application-layer locking | +| `to_tsvector(text)` | OpenSearch/Elasticsearch | Application-layer FTS | +| `COPY FROM/TO` | Batched INSERT | Max 3,000 rows per transaction | + +### uuid_generate_v4() → gen_random_uuid() + +```sql +-- PostgreSQL (requires uuid-ossp extension) +CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; +INSERT INTO users (id) VALUES (uuid_generate_v4()); + +-- DSQL (built-in, no extension) +INSERT INTO users (id) VALUES (gen_random_uuid()); + +-- In DEFAULT clauses: +CREATE TABLE users (id uuid PRIMARY KEY DEFAULT gen_random_uuid()); +``` + +**grep pattern:** `uuid_generate_v4` — replace all occurrences with `gen_random_uuid()` + +### lastval() → currval('sequence_name') + +```sql +-- PostgreSQL +INSERT INTO orders (customer_id) VALUES (1); +SELECT lastval(); -- returns the last sequence value from any sequence + +-- DSQL: lastval() not supported. Use explicit sequence name. +INSERT INTO orders (id, customer_id) VALUES (nextval('orders_id_seq'), 1); +SELECT currval('orders_id_seq'); -- explicit sequence name required +``` + +**grep pattern:** `lastval()` — replace with `currval('explicit_sequence_name')` + +### COPY → Batched INSERT + +```sql +-- PostgreSQL +COPY users (id, email, name) FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER); + +-- DSQL: COPY not supported. Use batched INSERT (500-1000 rows per transaction). +BEGIN; +INSERT INTO users (id, email, name) VALUES + ('uuid1', 'a@b.com', 'Alice'), + ('uuid2', 'c@d.com', 'Bob'), + -- ... up to 500-1000 rows + ; +COMMIT; +-- Repeat for next batch +``` + +--- + +## Fully Supported Functions + +### Aggregate Functions +`COUNT(*)`, `COUNT(col)`, `SUM`, `AVG`, `MIN`, `MAX`, `bool_and`, `bool_or`, +`string_agg`, `array_agg` (runtime), `json_agg`, `jsonb_agg`, `json_object_agg` + +### String Functions +`length`, `char_length`, `lower`, `upper`, `trim`, `ltrim`, `rtrim`, `substring`, +`position`, `replace`, `concat`, `concat_ws`, `left`, `right`, `repeat`, `reverse`, +`split_part`, `format`, `encode`, `decode`, `md5`, `regexp_replace`, `regexp_match` + +### Numeric Functions +`abs`, `ceil`, `floor`, `round`, `trunc`, `mod`, `power`, `sqrt`, `random`, +`greatest`, `least` + +### Date/Time Functions +`now()`, `current_timestamp`, `current_date`, `current_time`, `clock_timestamp()`, +`date_trunc`, `date_part`, `extract`, `age`, `make_interval`, `make_date`, +`to_char`, `to_date`, `to_timestamp` + +### JSON Functions (all PostgreSQL 9.16 functions work) +`json_build_object`, `json_build_array`, `jsonb_build_object`, `jsonb_build_array`, +`row_to_json`, `json_extract_path`, `json_extract_path_text`, `json_each`, +`json_array_elements`, `jsonb_set`, `jsonb_strip_nulls`, `json_typeof`, +`json_array_length`, `->`, `->>`, `#>`, `#>>`, `@>`, `?` + +### Window Functions +`ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`, `LAG()`, `LEAD()`, `FIRST_VALUE()`, +`LAST_VALUE()`, `NTH_VALUE()`, `NTILE()`, `SUM/AVG/COUNT OVER (...)` + +### Conditional & Subquery +`CASE WHEN`, `COALESCE`, `NULLIF`, `GREATEST`, `LEAST`, `EXISTS`, `IN`, `ANY`, `ALL` + +### Sequence Functions +`nextval(regclass)` ✅, `currval(regclass)` ✅, `setval(regclass, bigint)` ✅ + +### Type Casting +`CAST(x AS type)` ✅, `x::type` ✅, `x::jsonb` ✅ (runtime cast) + +--- + +## Partially Supported Functions + +### generate_series() + +```sql +-- Works for integer and timestamp series +SELECT generate_series(1, 100); -- ✅ +SELECT generate_series('2024-01-01'::timestamp, '2024-12-31'::timestamp, '1 month'); -- ✅ + +-- LIMITATION: If used in INSERT, results count toward 3,000 row limit +INSERT INTO numbers SELECT generate_series(1, 5000); -- ❌ exceeds limit +INSERT INTO numbers SELECT generate_series(1, 2000); -- ✅ under limit +``` + +### array_agg() / ARRAY constructor + +```sql +-- Works at runtime (SELECT) +SELECT array_agg(name) FROM users WHERE org_id = 1; -- ✅ returns text[] + +-- Cannot store result in a column (arrays not a stored type) +-- Use json_agg() if you need to persist: +SELECT json_agg(name) FROM users WHERE org_id = 1; -- ✅ storable as json +``` + +### unnest() + +```sql +-- Works with runtime arrays +SELECT unnest(ARRAY['a','b','c']); -- ✅ +-- Works with json arrays +SELECT json_array_elements_text('["a","b","c"]'::json); -- ✅ +``` + +--- + +## Not Supported — With Alternatives + +### Full-Text Search + +| Function | Alternative | +|---|---| +| `to_tsvector(config, text)` | OpenSearch / Elasticsearch | +| `to_tsquery(config, text)` | OpenSearch / Elasticsearch | +| `ts_rank(tsvector, tsquery)` | OpenSearch relevance scoring | +| `plainto_tsquery(text)` | OpenSearch query parser | +| `@@` operator | OpenSearch query | + +### Advisory Locks + +| Function | Alternative | +|---|---| +| `pg_advisory_lock(id)` | DynamoDB conditional write or Redis SETNX | +| `pg_advisory_unlock(id)` | DynamoDB delete or Redis DEL | +| `pg_try_advisory_lock(id)` | DynamoDB conditional write (non-blocking) | + +### Notification + +| Function | Alternative | +|---|---| +| `pg_notify(channel, payload)` | Amazon SNS Publish | +| `LISTEN channel` | SQS polling or EventBridge rules | +| `NOTIFY channel` | SNS Publish | + +### System/Admin Functions + +| Function | Notes | +|---|---| +| `pg_table_size(table)` | Not available (DSQL manages storage) | +| `pg_total_relation_size(table)` | Not available | +| `pg_stat_activity` | Not available | +| `pg_cancel_backend(pid)` | Not available | +| `pg_terminate_backend(pid)` | Not available | +| `current_setting(name)` | Limited (no custom GUCs) | +| `set_config(name, val, local)` | Not available | + +### Large Object Functions + +| Function | Alternative | +|---|---| +| `lo_create(oid)` | Use S3 for large objects | +| `lo_import(path)` | Upload to S3 | +| `lo_export(oid, path)` | Download from S3 | + +--- + +## Maintenance Commands + +| Command | DSQL Behavior | +|---|---| +| VACUUM | Not needed — automatic | +| VACUUM ANALYZE | Not needed — automatic | +| ANALYZE (table) | Supported (relation name only) | +| REINDEX | Not needed — automatic | +| CLUSTER | Not applicable (PK-ordered storage) | + +**Migration action:** Remove all VACUUM, REINDEX, and CLUSTER from maintenance scripts/cron jobs. + +--- + +## Transaction Control + +| Command | DSQL Support | +|---|---| +| BEGIN / COMMIT / ROLLBACK | ✅ | +| SAVEPOINT | ❌ Not supported | +| RELEASE SAVEPOINT | ❌ Not supported | +| ROLLBACK TO SAVEPOINT | ❌ Not supported | +| SET TRANSACTION ISOLATION LEVEL | Only REPEATABLE READ accepted | + +**Migration action:** Restructure any code using savepoints into separate transactions. + +--- + +## Migration Checklist for Function Usage + +1. **grep for `uuid_generate_v4`** → replace with `gen_random_uuid()` +2. **grep for `lastval()`** → replace with `currval('sequence_name')` +3. **grep for `COPY FROM` / `COPY TO`** → replace with batched INSERT +4. **grep for `pg_notify` / `LISTEN` / `NOTIFY`** → replace with SNS/SQS/EventBridge +5. **grep for `pg_advisory_lock`** → replace with DynamoDB/Redis +6. **grep for `to_tsvector` / `@@`** → replace with OpenSearch +7. **grep for `VACUUM` / `REINDEX` / `CLUSTER`** → remove from scripts +8. **grep for `SAVEPOINT`** → restructure into separate transactions +9. **grep for `lo_create` / `lo_import`** → replace with S3 +10. **Test ORDER BY** results with C collation — may differ from locale-aware PostgreSQL diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/index-conversion.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/index-conversion.md new file mode 100644 index 00000000..c9ed4ce5 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/index-conversion.md @@ -0,0 +1,296 @@ +# Index Conversion for DSQL + +DSQL only supports btree indexes created with `CREATE INDEX ASYNC`. `dsql_lint` flags +non-btree indexes (`index_using`), partial indexes (`index_partial`), and expression indexes +(`index_expression`) as **unfixable**. This file provides the actual conversion patterns. + +Sources: +- [Asynchronous Indexes](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-indexes.html) +- [DSQL SQL Dialect Blog](https://aws.amazon.com/blogs/database/dsql-sql-dialect-how-amazon-aurora-dsql-differs-from-single-instance-postgresql/) + +--- + +## Conversion Rules Summary + +| PostgreSQL Index Feature | DSQL Conversion | Notes | +|---|---|---| +| CREATE INDEX | CREATE INDEX ASYNC | `dsql_lint` handles | +| CREATE UNIQUE INDEX | CREATE UNIQUE INDEX ASYNC | Uniqueness preserved | +| USING btree | USING btree (default) | Direct | +| USING gin | → btree | See GIN conversion below | +| USING gist | → btree | See GiST conversion below | +| USING brin | → btree | See BRIN conversion below | +| USING hash | → btree | Hash not supported | +| WHERE clause (partial) | Removed | Filter at query time | +| INCLUDE (columns) | Preserved | DSQL supports covering indexes | +| DESC/ASC | Preserved | Sort order supported | +| NULLS FIRST/LAST | Preserved | Null ordering supported | +| CONCURRENTLY | Removed | Use ASYNC instead | +| Expression indexes | → computed column + index | See below | +| IF NOT EXISTS | Preserved | Supported | +| Operator class (text_pattern_ops) | Removed | Not needed with C collation | + +--- + +## GIN Index Conversion + +GIN indexes are used for full-text search, JSONB containment, and array operations. +DSQL does not support GIN — convert to btree where possible. + +### JSONB GIN → No Index (Query-Time Cast) + +```sql +-- PostgreSQL: GIN index on JSONB column +CREATE INDEX idx_users_prefs ON users USING gin (preferences); +-- Used for: preferences @> '{"theme":"dark"}' + +-- DSQL: No equivalent index. JSONB operators work at runtime without index. +-- The query still works, just without index acceleration: +SELECT * FROM users WHERE preferences::jsonb @> '{"theme":"dark"}'; + +-- If you need indexed lookup on a specific JSON key, extract to a column: +ALTER TABLE users ADD COLUMN pref_theme text COLLATE "C"; +-- Backfill: UPDATE users SET pref_theme = preferences::jsonb->>'theme'; +CREATE INDEX ASYNC idx_users_pref_theme ON users (pref_theme); +-- Query: SELECT * FROM users WHERE pref_theme = 'dark'; +``` + +### Array GIN → JSON + Extracted Column + +```sql +-- PostgreSQL: GIN index on array column +CREATE INDEX idx_posts_tags ON posts USING gin (tags); +-- Used for: tags @> ARRAY['database'] + +-- DSQL: Store tags as json, extract to separate table for indexed lookup +CREATE TABLE post_tags ( + post_id uuid NOT NULL, + tag text COLLATE "C" NOT NULL +); +CREATE INDEX ASYNC idx_post_tags_tag ON post_tags (tag); +CREATE INDEX ASYNC idx_post_tags_post ON post_tags (post_id); +-- Query: SELECT DISTINCT post_id FROM post_tags WHERE tag = 'database'; +``` + +### Full-Text Search GIN → External Service + +```sql +-- PostgreSQL: GIN index for full-text search +CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', title || ' ' || body)); + +-- DSQL: No equivalent. Use OpenSearch/Elasticsearch for full-text search. +-- Store the text in DSQL, index in OpenSearch, query OpenSearch for IDs, then fetch from DSQL. +-- Remove the index entirely from the DSQL schema. +``` + +### Trigram GIN (pg_trgm) → Application Layer + +```sql +-- PostgreSQL: Trigram index for LIKE '%pattern%' +CREATE INDEX idx_users_name_trgm ON users USING gin (name gin_trgm_ops); + +-- DSQL: No equivalent. Options: +-- 1. Use prefix matching (LIKE 'pattern%') with a btree index +CREATE INDEX ASYNC idx_users_name ON users (name); +-- 2. Use OpenSearch for fuzzy/substring matching +-- 3. Accept full scan for infrequent LIKE '%pattern%' queries +``` + +--- + +## GiST Index Conversion + +GiST indexes are used for geometric data, range types, and exclusion constraints. + +### Geometric GiST → No Index + +```sql +-- PostgreSQL: GiST index on point column +CREATE INDEX idx_locations_coords ON locations USING gist (coords); + +-- DSQL: Geometric types stored as text. No spatial indexing. +-- Option 1: Store lat/lng as separate numeric columns, index those +ALTER TABLE locations ADD COLUMN lat double precision; +ALTER TABLE locations ADD COLUMN lng double precision; +CREATE INDEX ASYNC idx_locations_lat ON locations (lat); +CREATE INDEX ASYNC idx_locations_lng ON locations (lng); +-- Bounding box queries: WHERE lat BETWEEN x1 AND x2 AND lng BETWEEN y1 AND y2 + +-- Option 2: Use a geohash text column for proximity queries +ALTER TABLE locations ADD COLUMN geohash text COLLATE "C"; +CREATE INDEX ASYNC idx_locations_geohash ON locations (geohash); +-- Prefix matching: WHERE geohash LIKE 'dr5ru%' +``` + +### Range GiST → Separate Columns + +```sql +-- PostgreSQL: GiST index on range type +CREATE INDEX idx_events_during ON events USING gist (during); +-- Used for: during && '[2024-01-01, 2024-02-01)' + +-- DSQL: Store range as two columns +CREATE TABLE events ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + start_time timestamptz NOT NULL, + end_time timestamptz NOT NULL +); +CREATE INDEX ASYNC idx_events_start ON events (start_time); +CREATE INDEX ASYNC idx_events_end ON events (end_time); +-- Overlap query: WHERE start_time < '2024-02-01' AND end_time > '2024-01-01' +``` + +--- + +## BRIN Index Conversion + +BRIN indexes are used for large, naturally-ordered tables (time-series data). + +```sql +-- PostgreSQL: BRIN index on timestamp column +CREATE INDEX idx_logs_created ON logs USING brin (created_at); + +-- DSQL: Use btree. DSQL's PK-ordered storage provides similar benefits +-- if created_at correlates with PK order. +CREATE INDEX ASYNC idx_logs_created ON logs (created_at); + +-- If the table is very large and you need to limit index size, +-- use a composite index with the most selective column first: +CREATE INDEX ASYNC idx_logs_tenant_created ON logs (tenant_id, created_at DESC); +``` + +--- + +## Partial Index Conversion + +`dsql_lint` flags partial indexes (`index_partial`) as unfixable. The conversion is to +remove the WHERE clause and create a full index. + +```sql +-- PostgreSQL: Partial index +CREATE INDEX idx_orders_pending ON orders (customer_id, created_at) + WHERE status = 'pending'; + +-- DSQL: Full index (WHERE removed). Filter at query time. +CREATE INDEX ASYNC idx_orders_pending ON orders (customer_id, created_at); +-- The query still works, just scans more index entries: +-- SELECT * FROM orders WHERE customer_id = $1 AND status = 'pending' ORDER BY created_at; + +-- Better alternative: Include status in the index for filtering +CREATE INDEX ASYNC idx_orders_customer_status ON orders (customer_id, status, created_at DESC); +-- Query: WHERE customer_id = $1 AND status = 'pending' ORDER BY created_at DESC +``` + +**Trade-off:** Full indexes are larger than partial indexes. If the partial condition is very +selective (e.g., only 1% of rows match), the full index will be significantly larger. Consider +whether the query pattern justifies the index at all, or if a composite index with the filter +column is better. + +--- + +## Expression Index Conversion + +`dsql_lint` flags expression indexes (`index_expression`) as unfixable. The conversion is to +create a computed column (GENERATED ALWAYS AS STORED) and index that column. + +```sql +-- PostgreSQL: Expression index +CREATE INDEX idx_users_email_lower ON users (lower(email)); + +-- DSQL: Computed column + index +ALTER TABLE users ADD COLUMN email_lower text COLLATE "C" + GENERATED ALWAYS AS (lower(email)) STORED; +CREATE INDEX ASYNC idx_users_email_lower ON users (email_lower); +-- Query: WHERE email_lower = lower($1) +``` + +```sql +-- PostgreSQL: Expression index on date extraction +CREATE INDEX idx_orders_year ON orders (extract(year FROM created_at)); + +-- DSQL: Computed column + index +ALTER TABLE orders ADD COLUMN created_year integer + GENERATED ALWAYS AS (extract(year FROM created_at)::integer) STORED; +CREATE INDEX ASYNC idx_orders_year ON orders (created_year); +-- Query: WHERE created_year = 2024 +``` + +```sql +-- PostgreSQL: Expression index on JSON field +CREATE INDEX idx_users_city ON users ((preferences->>'city')); + +-- DSQL: Computed column + index +ALTER TABLE users ADD COLUMN pref_city text COLLATE "C" + GENERATED ALWAYS AS (preferences::jsonb->>'city') STORED; +CREATE INDEX ASYNC idx_users_city ON users (pref_city); +-- Query: WHERE pref_city = 'Seattle' +``` + +**Note:** DSQL supports `GENERATED ALWAYS AS (expr) STORED` — this is the correct approach +for expression indexes. The computed column is automatically maintained by the database. + +--- + +## Index Limits + +| Limit | Value | +|---|---| +| Max indexes per table | 24 | +| Max columns per index | 8 | +| Max PK/index key size | 1 KiB | + +**Strategy when approaching 24 index limit:** +- Use composite indexes instead of multiple single-column indexes +- Use INCLUDE columns for covering indexes (avoids storage round-trips) +- Remove indexes for rarely-used query patterns +- Consider if the query can use an existing composite index with a prefix match + +--- + +## Monitoring Async Index Status + +Indexes created with ASYNC are not immediately usable. Monitor: + +```sql +-- Check for indexes still being built +SELECT indexrelid::regclass AS index_name, indisvalid AS is_ready +FROM pg_index +WHERE NOT indisvalid; + +-- If this returns rows, those indexes are still building. +-- Queries work but won't use the index until indisvalid = true. +``` + +**Do NOT rely on index performance until `indisvalid = true`.** + +--- + +## Conversion Decision Flowchart + +``` +Is it a btree index? +├── Yes → CREATE INDEX ASYNC (preserve columns, INCLUDE, sort order) +│ +├── Is it GIN? +│ ├── For JSONB containment → extract key to column + btree +│ ├── For array ops → normalize to join table + btree +│ ├── For FTS → remove (use OpenSearch) +│ └── For trigram → remove or use prefix btree +│ +├── Is it GiST? +│ ├── For geometry → separate lat/lng columns + btree +│ ├── For ranges → separate start/end columns + btree +│ └── For exclusion → remove (enforce in application) +│ +├── Is it BRIN? +│ └── Convert to btree (DSQL PK-order gives similar benefit) +│ +├── Is it a partial index (WHERE)? +│ └── Remove WHERE, create full index (or add filter column to index) +│ +├── Is it an expression index? +│ └── Add GENERATED ALWAYS AS STORED column + btree index on it +│ +└── Is it CONCURRENTLY? + └── Remove CONCURRENTLY, use ASYNC +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/multi-region.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/multi-region.md new file mode 100644 index 00000000..ff564873 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/multi-region.md @@ -0,0 +1,97 @@ +# Multi-Region DSQL Design + +Aurora DSQL's key differentiator: active-active multi-region with strong consistency. +This file covers architecture, schema implications, and application design patterns. + +Sources: +- [What is Aurora DSQL](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/what-is-aurora-dsql.html) +- [Multi-Region Clusters](https://awslabs.github.io/aurora-dsql-starter-kit/multi-region-clusters.html) +- [Multi-Region Endpoint Routing](https://aws.amazon.com/blogs/database/implement-multi-region-endpoint-routing-for-amazon-aurora-dsql/) + +--- + +## Overview + +| Configuration | Availability | Regions | Use Case | +|---|---|---|---| +| Single-Region | 99.99% | 1 | Standard workloads | +| Multi-Region | 99.999% | 2 + witness | Global apps, DR, compliance | + +**Key properties:** +- Active-active: both regions handle reads AND writes +- Strongly consistent: all reads/writes to any endpoint are consistent +- Synchronous replication (not eventual) +- Same schema automatically in both regions — deploy DDL once +- Zero data loss failover + +--- + +## Schema Implications + +### Deploy DDL Once + +Schema DDL only needs to execute against ONE region — it propagates automatically: + +```sql +-- Connect to Region 1 endpoint +CREATE TABLE orders (id uuid PRIMARY KEY DEFAULT gen_random_uuid(), ...); +-- Table is immediately available in Region 2 as well +``` + +### Performance + +| Operation | Single-Region | Multi-Region | +|---|---|---| +| Read (local) | ~2-5ms | ~2-5ms | +| Write (local) | ~10-20ms | ~50-100ms (cross-region sync) | +| Read-only transaction commit | 0ms | 0ms | + +--- + +## Application Design Patterns + +### Geographic Partitioning (Minimize Cross-Region Conflicts) + +```sql +-- Design PK to include region affinity +CREATE TABLE user_sessions ( + region varchar(20) COLLATE "C", + session_id uuid DEFAULT gen_random_uuid(), + user_id uuid NOT NULL, + PRIMARY KEY (region, session_id) +); +-- Region 1 writes with region='us-east-1' +-- Region 2 writes with region='us-east-2' +-- No cross-region conflicts on same rows +``` + +### Connection Routing + +- **Latency-based (Route 53):** Route to nearest region +- **Failover:** Primary/secondary with health checks +- **Application-level:** Connection string per region + +### OCC in Multi-Region + +Cross-region write conflicts use the same SQLSTATE 40001 mechanism. The transaction with +the earlier commit timestamp wins. Design for low contention across regions. + +--- + +## Optimization Tips + +1. Partition data by geography to minimize cross-region conflicts +2. Use UUIDs for PKs (random distribution in both regions) +3. Keep transactions short (less conflict window) +4. Read-only transactions have zero commit latency +5. Use covering indexes to avoid cross-region storage fetches + +--- + +## Quotas + +| Quota | Value | +|---|---| +| Multi-region clusters per account | 5 (increasable) | +| Regions per cluster | 2 + 1 witness | +| Storage per cluster | 10 TiB (up to 256 TiB) | diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/occ-retry-patterns.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/occ-retry-patterns.md new file mode 100644 index 00000000..675c1af1 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/occ-retry-patterns.md @@ -0,0 +1,381 @@ +# OCC Retry Patterns for DSQL + +DSQL uses Optimistic Concurrency Control (OCC). Any write transaction can fail with +`SQLSTATE 40001` (serialization failure). This is normal and expected — not a bug. +Every application connecting to DSQL MUST implement retry logic. + +`dsql_lint` does not generate retry code. Use these patterns in your application layer. + +Sources: +- [Concurrency Control](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-concurrency-control.html) +- [Migration Guide](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-migration-guide.html) + +--- + +## Retry Strategy + +``` +Max retries: 5 +Base delay: 50ms +Backoff: exponential with jitter +Formula: delay = min(base * 2^attempt + random(0, base), max_delay) +Max delay: 5000ms +Retryable: SQLSTATE 40001 only +Non-retryable: all other errors (raise immediately) +``` + +--- + +## Python (psycopg2) + +```python +import time +import random +import psycopg2 +from psycopg2 import errors + +def execute_with_retry(conn_params, operation, max_retries=5): + """Execute a database operation with OCC retry. + + Args: + conn_params: dict with host, port, dbname, user, password, sslmode + operation: callable(cursor) that performs the database work + max_retries: maximum retry attempts (default 5) + """ + for attempt in range(max_retries): + conn = psycopg2.connect(**conn_params) + conn.autocommit = False + try: + with conn.cursor() as cur: + operation(cur) + conn.commit() + return + except errors.SerializationFailure: + conn.rollback() + if attempt < max_retries - 1: + delay = min(0.05 * (2 ** attempt) + random.uniform(0, 0.05), 5.0) + time.sleep(delay) + else: + raise + except Exception: + conn.rollback() + raise + finally: + conn.close() + +# Usage: +def create_order(cur): + cur.execute( + "INSERT INTO orders (id, customer_id, total) VALUES (gen_random_uuid(), %s, %s)", + [customer_id, total] + ) + +execute_with_retry(conn_params, create_order) +``` + +## Python (asyncpg) + +```python +import asyncio +import random +import asyncpg + +async def execute_with_retry(pool, operation, max_retries=5): + """Execute with OCC retry using asyncpg connection pool.""" + for attempt in range(max_retries): + async with pool.acquire() as conn: + try: + async with conn.transaction(): + await operation(conn) + return + except asyncpg.SerializationError: + if attempt < max_retries - 1: + delay = min(0.05 * (2 ** attempt) + random.uniform(0, 0.05), 5.0) + await asyncio.sleep(delay) + else: + raise + +# Usage: +async def create_order(conn): + await conn.execute( + "INSERT INTO orders (id, customer_id, total) VALUES (gen_random_uuid(), $1, $2)", + customer_id, total + ) + +await execute_with_retry(pool, create_order) +``` + +--- + +## Node.js (pg) + +```javascript +const { Pool } = require('pg'); + +async function executeWithRetry(pool, operation, maxRetries = 5) { + for (let attempt = 0; attempt < maxRetries; attempt++) { + const client = await pool.connect(); + try { + await client.query('BEGIN'); + await operation(client); + await client.query('COMMIT'); + return; + } catch (err) { + await client.query('ROLLBACK'); + if (err.code === '40001' && attempt < maxRetries - 1) { + const delay = Math.min(50 * Math.pow(2, attempt) + Math.random() * 50, 5000); + await new Promise(resolve => setTimeout(resolve, delay)); + } else { + throw err; + } + } finally { + client.release(); + } + } +} + +// Usage: +await executeWithRetry(pool, async (client) => { + await client.query( + 'INSERT INTO orders (id, customer_id, total) VALUES (gen_random_uuid(), $1, $2)', + [customerId, total] + ); +}); +``` + +## TypeScript (with typed errors) + +```typescript +import { Pool, PoolClient, DatabaseError } from 'pg'; + +async function executeWithRetry( + pool: Pool, + operation: (client: PoolClient) => Promise, + maxRetries = 5 +): Promise { + for (let attempt = 0; attempt < maxRetries; attempt++) { + const client = await pool.connect(); + try { + await client.query('BEGIN'); + const result = await operation(client); + await client.query('COMMIT'); + return result; + } catch (err) { + await client.query('ROLLBACK'); + if (err instanceof DatabaseError && err.code === '40001' && attempt < maxRetries - 1) { + const delay = Math.min(50 * 2 ** attempt + Math.random() * 50, 5000); + await new Promise(r => setTimeout(r, delay)); + } else { + throw err; + } + } finally { + client.release(); + } + } + throw new Error('Max retries exceeded'); +} +``` + +--- + +## Java (Spring Boot with Spring Retry) + +```java +import org.springframework.retry.annotation.Retryable; +import org.springframework.retry.annotation.Backoff; +import org.springframework.transaction.annotation.Transactional; +import org.hibernate.exception.LockAcquisitionException; + +@Service +public class OrderService { + + @Retryable( + retryFor = {LockAcquisitionException.class}, + maxAttempts = 5, + backoff = @Backoff(delay = 50, multiplier = 2, maxDelay = 5000) + ) + @Transactional + public Order createOrder(UUID customerId, BigDecimal total) { + Order order = new Order(); + order.setCustomerId(customerId); + order.setTotal(total); + return orderRepository.save(order); + } +} +``` + +## Java (Manual Retry without Spring Retry) + +```java +import java.sql.*; +import java.util.concurrent.ThreadLocalRandom; + +public class DsqlRetry { + + public static T executeWithRetry( + DataSource ds, RetryableOperation operation, int maxRetries + ) throws Exception { + for (int attempt = 0; attempt < maxRetries; attempt++) { + try (Connection conn = ds.getConnection()) { + conn.setAutoCommit(false); + try { + T result = operation.execute(conn); + conn.commit(); + return result; + } catch (SQLException e) { + conn.rollback(); + if ("40001".equals(e.getSQLState()) && attempt < maxRetries - 1) { + long delay = Math.min( + 50 * (long) Math.pow(2, attempt) + ThreadLocalRandom.current().nextLong(50), + 5000 + ); + Thread.sleep(delay); + } else { + throw e; + } + } + } + } + throw new RuntimeException("Max retries exceeded"); + } + + @FunctionalInterface + public interface RetryableOperation { + T execute(Connection conn) throws SQLException; + } +} +``` + +--- + +## Go + +```go +package dsql + +import ( + "context" + "database/sql" + "math" + "math/rand" + "time" + + "github.com/lib/pq" +) + +func ExecuteWithRetry(ctx context.Context, db *sql.DB, operation func(tx *sql.Tx) error, maxRetries int) error { + for attempt := 0; attempt < maxRetries; attempt++ { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + + err = operation(tx) + if err != nil { + tx.Rollback() + if isOCCConflict(err) && attempt < maxRetries-1 { + delay := math.Min( + float64(50)*math.Pow(2, float64(attempt))+rand.Float64()*50, + 5000, + ) + time.Sleep(time.Duration(delay) * time.Millisecond) + continue + } + return err + } + + err = tx.Commit() + if err != nil { + if isOCCConflict(err) && attempt < maxRetries-1 { + delay := math.Min( + float64(50)*math.Pow(2, float64(attempt))+rand.Float64()*50, + 5000, + ) + time.Sleep(time.Duration(delay) * time.Millisecond) + continue + } + return err + } + return nil + } + return fmt.Errorf("max retries exceeded") +} + +func isOCCConflict(err error) bool { + if pqErr, ok := err.(*pq.Error); ok { + return pqErr.Code == "40001" + } + return false +} +``` + +--- + +## When OCC Conflicts Are Likely + +| Scenario | Conflict Risk | Mitigation | +|---|---|---| +| Counter/balance updates | High | Shard counters, use CACHE 65536 sequences | +| Status field updates (same row) | High | Keep transactions short | +| Batch updates overlapping rows | Medium | Smaller batches, randomize order | +| Long-running transactions | Medium | Break into smaller units (<5 min) | +| Cross-region writes to same rows | High | Geographic partitioning | +| INSERT-only workloads | Low | UUID PKs distribute writes | +| Read-heavy with rare writes | Low | Minimal concern | + +## Mitigation Strategies + +1. **Keep transactions short** — fewer rows, less time = less conflict window +2. **Use UUID primary keys** — random distribution avoids hot spots +3. **Design idempotent operations** — safe to retry without side effects +4. **Avoid hot rows** — shard counters, don't update the same row from many threads +5. **Batch writes in small groups** — 100-500 rows per transaction (not 3,000) +6. **Use CACHE 65536 for high-throughput sequences** — reduces round-trips +7. **Geographic partitioning** (multi-region) — route writes to local region + +--- + +## Idempotent Transaction Design + +For OCC retry safety, transactions SHOULD be idempotent: + +```sql +-- GOOD: Idempotent (safe to retry) +INSERT INTO orders (id, customer_id, total) +VALUES ($1, $2, $3) +ON CONFLICT (id) DO NOTHING; + +-- GOOD: Idempotent update +UPDATE orders SET status = 'shipped' WHERE id = $1 AND status = 'processing'; + +-- BAD: Not idempotent (double-charges on retry) +UPDATE accounts SET balance = balance - 100 WHERE id = $1; + +-- GOOD: Idempotent version (use expected value) +UPDATE accounts SET balance = $2 WHERE id = $1 AND balance = $3; +-- Where $3 is the balance you read before the transaction +``` + +--- + +## Testing OCC Retry + +To verify your retry logic works: + +```python +# Simulate OCC conflict in tests +import unittest +from unittest.mock import patch, MagicMock +from psycopg2 import errors + +class TestOCCRetry(unittest.TestCase): + def test_retries_on_serialization_failure(self): + mock_cursor = MagicMock() + # First call raises 40001, second succeeds + mock_cursor.execute.side_effect = [ + errors.SerializationFailure("OCC conflict"), + None + ] + # Verify retry logic calls execute twice + # Verify delay was applied between attempts +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/plpgsql-patterns.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/plpgsql-patterns.md new file mode 100644 index 00000000..06fa583b --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/plpgsql-patterns.md @@ -0,0 +1,440 @@ +# PL/pgSQL → SQL Transpilation Patterns + +DSQL does not support PL/pgSQL. This file provides 10 recognized patterns for converting +PL/pgSQL trigger functions and procedures to pure SQL functions (`LANGUAGE sql`) that work +in Aurora DSQL. + +`dsql_lint` does NOT handle PL/pgSQL conversion — it will flag PL/pgSQL as unsupported but +cannot generate the replacement. Use these patterns to produce the converted output. + +Sources: +- [Migration Guide — Application-level logic](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-migration-guide.html) +- [Supported SQL Features — CREATE FUNCTION](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-supported-sql-features.html) + +--- + +## Pattern Detection Quick Reference + +| # | Pattern | Detection Signal | Output | +|---|---|---|---| +| 1 | SET_COLUMN | `NEW. = ; RETURN NEW;` | SQL UPDATE function | +| 2 | VALIDATION | `IF NEW. ... RAISE EXCEPTION` | CHECK constraint | +| 3 | AUDIT_INSERT | `INSERT INTO audit_log ... TG_OP` | SQL INSERT function | +| 4 | CASCADE_DML | `UPDATE/DELETE ... WHERE ... OLD.id` | SQL DML function | +| 5 | FOR_LOOP | `FOR r IN SELECT ... LOOP UPDATE` | Set-based UPDATE...FROM | +| 6 | IF_ELSE | `IF cond THEN RETURN x ELSE RETURN y` | CASE WHEN expression | +| 7 | UPSERT | `EXCEPTION WHEN unique_violation` | ON CONFLICT clause | +| 8 | DYNAMIC_SQL | `EXECUTE format(...)` | One function per table | +| 9 | CURSOR | `DECLARE cur CURSOR ... LOOP` | INSERT...SELECT | +| 10 | COALESCE | `EXCEPTION WHEN no_data_found` | COALESCE(subquery, NULL) | + +--- + +## Pattern 1: SET_COLUMN + +**Intent:** Set a column value on INSERT/UPDATE (e.g., updated_at timestamp) + +**Detection:** Function body contains `NEW. = ; RETURN NEW;` + +**Before (PL/pgSQL trigger):** +```sql +CREATE FUNCTION set_updated_at() RETURNS TRIGGER AS $$ +BEGIN + NEW.updated_at = now(); + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER trg_users_updated BEFORE UPDATE ON users + FOR EACH ROW EXECUTE FUNCTION set_updated_at(); +``` + +**After (SQL function — call from application):** +```sql +CREATE FUNCTION apply_set_updated_at_users(p_id bigint) RETURNS void +LANGUAGE sql AS $$ + UPDATE users SET updated_at = now() WHERE id = p_id; +$$; +``` + +**App responsibility:** Call `SELECT apply_set_updated_at_users(id)` after every UPDATE on the table. Alternatively, include `updated_at = now()` directly in your UPDATE statement. + +**Simpler alternative:** Skip the function entirely — just add `updated_at = now()` to every UPDATE in your application code. + +--- + +## Pattern 2: VALIDATION → CHECK Constraint + +**Intent:** Reject invalid data on INSERT/UPDATE + +**Detection:** Function body contains `IF NEW. THEN RAISE EXCEPTION` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION validate_price() RETURNS TRIGGER AS $$ +BEGIN + IF NEW.price < 0 THEN + RAISE EXCEPTION 'price must be non-negative'; + END IF; + IF NEW.quantity > 10000 THEN + RAISE EXCEPTION 'quantity exceeds maximum'; + END IF; + RETURN NEW; +END; +$$ LANGUAGE plpgsql; +``` + +**After (CHECK constraints — automatic enforcement):** +```sql +-- Add at CREATE TABLE time (cannot ALTER TABLE ADD CHECK in DSQL) +CREATE TABLE products ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + price numeric(10,2) CHECK (price >= 0), + quantity integer CHECK (quantity <= 10000) +); +``` + +**App responsibility:** None — CHECK is enforced automatically by DSQL. + +**Important:** DSQL does not support `ALTER TABLE ADD CHECK`. The constraint MUST be defined at CREATE TABLE time. If the table already exists, use the Table Recreation Pattern. + +--- + +## Pattern 3: AUDIT_INSERT + +**Intent:** Log changes to an audit table after DML + +**Detection:** Function body contains `INSERT INTO audit_log` with `TG_OP`, `TG_TABLE_NAME`, `row_to_json` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION log_change() RETURNS TRIGGER AS $$ +BEGIN + INSERT INTO audit_log (table_name, action, old_data, new_data, changed_at) + VALUES (TG_TABLE_NAME, TG_OP, row_to_json(OLD)::text, row_to_json(NEW)::text, now()); + RETURN NEW; +END; +$$ LANGUAGE plpgsql; +``` + +**After (SQL function):** +```sql +CREATE FUNCTION audit_log_orders( + p_action text, + p_old_data text, + p_new_data text +) RETURNS void +LANGUAGE sql AS $$ + INSERT INTO audit_log (table_name, action, old_data, new_data, changed_at) + VALUES ('orders', p_action, p_old_data, p_new_data, now()); +$$; +``` + +**App responsibility:** Call after INSERT/UPDATE/DELETE: +```python +# Python example +old_json = json.dumps(old_row) if old_row else None +new_json = json.dumps(new_row) if new_row else None +cursor.execute("SELECT audit_log_orders(%s, %s, %s)", ['UPDATE', old_json, new_json]) +``` + +--- + +## Pattern 4: CASCADE_DML + +**Intent:** Update/delete related rows when a parent changes (ON DELETE CASCADE replacement) + +**Detection:** Function body contains `UPDATE/DELETE ... WHERE = OLD.id` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION cascade_delete_user() RETURNS TRIGGER AS $$ +BEGIN + UPDATE orders SET status = 'cancelled' WHERE user_id = OLD.id; + DELETE FROM sessions WHERE user_id = OLD.id; + DELETE FROM preferences WHERE user_id = OLD.id; + RETURN OLD; +END; +$$ LANGUAGE plpgsql; +``` + +**After (SQL function):** +```sql +CREATE FUNCTION cascade_delete_user(p_user_id bigint) RETURNS void +LANGUAGE sql AS $$ + UPDATE orders SET status = 'cancelled' WHERE user_id = p_user_id; + DELETE FROM sessions WHERE user_id = p_user_id; + DELETE FROM preferences WHERE user_id = p_user_id; +$$; +``` + +**App responsibility:** Call BEFORE deleting the parent row: +```python +cursor.execute("SELECT cascade_delete_user(%s)", [user_id]) +cursor.execute("DELETE FROM users WHERE id = %s", [user_id]) +``` + +--- + +## Pattern 5: FOR_LOOP → Set-Based + +**Intent:** Process rows one at a time (batch update pattern) + +**Detection:** Function body contains `FOR r IN SELECT ... LOOP ... UPDATE ... END LOOP` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION expire_old_tickets() RETURNS void AS $$ +DECLARE r RECORD; +BEGIN + FOR r IN SELECT id FROM tickets WHERE due_date < CURRENT_DATE AND NOT resolved + LOOP + UPDATE tickets SET resolved = TRUE, resolved_at = now() WHERE id = r.id; + END LOOP; +END; +$$ LANGUAGE plpgsql; +``` + +**After (SQL — single set-based statement):** +```sql +CREATE FUNCTION expire_old_tickets() RETURNS void +LANGUAGE sql AS $$ + UPDATE tickets SET resolved = TRUE, resolved_at = now() + FROM (SELECT id FROM tickets WHERE due_date < CURRENT_DATE AND NOT resolved) AS _src + WHERE tickets.id = _src.id; +$$; +``` + +**App responsibility:** None — call the function directly. Much faster than row-by-row. + +**Note:** If the UPDATE affects >3,000 rows, batch it in the application layer. + +--- + +## Pattern 6: IF_ELSE → CASE WHEN + +**Intent:** Return different values based on conditions + +**Detection:** Function body contains `IF cond THEN RETURN x; ELSE RETURN y; END IF;` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION get_priority_label(sev integer) RETURNS text AS $$ +BEGIN + IF sev = 1 THEN RETURN 'critical'; + ELSIF sev = 2 THEN RETURN 'high'; + ELSIF sev = 3 THEN RETURN 'medium'; + ELSE RETURN 'low'; + END IF; +END; +$$ LANGUAGE plpgsql; +``` + +**After (SQL with CASE WHEN):** +```sql +CREATE FUNCTION get_priority_label(sev integer) RETURNS text +LANGUAGE sql AS $$ + SELECT CASE + WHEN sev = 1 THEN 'critical' + WHEN sev = 2 THEN 'high' + WHEN sev = 3 THEN 'medium' + ELSE 'low' + END; +$$; +``` + +**App responsibility:** None — pure SQL function. + +--- + +## Pattern 7: EXCEPTION unique_violation → ON CONFLICT + +**Intent:** Insert or update (upsert) + +**Detection:** Function body contains `EXCEPTION WHEN unique_violation THEN UPDATE` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION upsert_setting(p_user_id uuid, p_key text, p_value text) RETURNS void AS $$ +BEGIN + INSERT INTO user_settings (user_id, key, value) VALUES (p_user_id, p_key, p_value); +EXCEPTION WHEN unique_violation THEN + UPDATE user_settings SET value = p_value WHERE user_id = p_user_id AND key = p_key; +END; +$$ LANGUAGE plpgsql; +``` + +**After (SQL with ON CONFLICT):** +```sql +CREATE FUNCTION upsert_setting(p_user_id uuid, p_key text, p_value text) RETURNS void +LANGUAGE sql AS $$ + INSERT INTO user_settings (user_id, key, value) + VALUES (p_user_id, p_key, p_value) + ON CONFLICT (user_id, key) DO UPDATE SET value = EXCLUDED.value; +$$; +``` + +**App responsibility:** None — ON CONFLICT is handled by DSQL natively. + +**Note:** Requires a UNIQUE constraint or index on the conflict columns. + +--- + +## Pattern 8: Dynamic SQL → Expanded Per-Table + +**Intent:** Run same DML on different tables passed as parameter + +**Detection:** Function body contains `EXECUTE format('... %I ...', table_name)` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION cleanup_old_records(tbl text, days integer) RETURNS void AS $$ +BEGIN + EXECUTE format('DELETE FROM %I WHERE created_at < now() - interval ''%s days''', tbl, days); +END; +$$ LANGUAGE plpgsql; +``` + +**After (one concrete function per table):** +```sql +CREATE FUNCTION cleanup_orders(p_days integer) RETURNS void +LANGUAGE sql AS $$ + DELETE FROM orders WHERE created_at < now() - make_interval(days => p_days); +$$; + +CREATE FUNCTION cleanup_logs(p_days integer) RETURNS void +LANGUAGE sql AS $$ + DELETE FROM logs WHERE created_at < now() - make_interval(days => p_days); +$$; + +CREATE FUNCTION cleanup_sessions(p_days integer) RETURNS void +LANGUAGE sql AS $$ + DELETE FROM sessions WHERE created_at < now() - make_interval(days => p_days); +$$; +``` + +**App responsibility:** Call the table-specific function instead of the generic one. + +--- + +## Pattern 9: CURSOR → Set-Based + +**Intent:** Process query results row by row with a cursor + +**Detection:** Function body contains `DECLARE cur CURSOR FOR ... FOR rec IN cur LOOP` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION notify_inactive_users() RETURNS void AS $$ +DECLARE + cur CURSOR FOR SELECT id, email FROM users + WHERE last_login < now() - interval '90 days' AND active = true; + rec RECORD; +BEGIN + FOR rec IN cur LOOP + INSERT INTO notifications (user_id, message, created_at) + VALUES (rec.id, 'Your account will be deactivated soon', now()); + END LOOP; +END; +$$ LANGUAGE plpgsql; +``` + +**After (SQL — INSERT...SELECT):** +```sql +CREATE FUNCTION notify_inactive_users() RETURNS void +LANGUAGE sql AS $$ + INSERT INTO notifications (user_id, message, created_at) + SELECT id, 'Your account will be deactivated soon', now() + FROM users + WHERE last_login < now() - interval '90 days' AND active = true; +$$; +``` + +**App responsibility:** None — single statement, much faster. + +**Note:** If the INSERT affects >3,000 rows, batch in the application layer with LIMIT/OFFSET. + +--- + +## Pattern 10: EXCEPTION no_data_found → COALESCE + +**Intent:** Return NULL instead of raising error when no rows found + +**Detection:** Function body contains `EXCEPTION WHEN no_data_found THEN RETURN NULL` + +**Before (PL/pgSQL):** +```sql +CREATE FUNCTION safe_get_org_name(p_id integer) RETURNS text AS $$ +DECLARE result text; +BEGIN + SELECT name INTO STRICT result FROM organizations WHERE id = p_id; + RETURN result; +EXCEPTION WHEN no_data_found THEN + RETURN NULL; +END; +$$ LANGUAGE plpgsql; +``` + +**After (SQL with COALESCE or plain subquery):** +```sql +CREATE FUNCTION safe_get_org_name(p_id integer) RETURNS text +LANGUAGE sql AS $$ + SELECT name FROM organizations WHERE id = p_id; +$$; +-- Returns NULL naturally when no rows match (no STRICT = no exception) +``` + +**App responsibility:** None — SQL functions return NULL for empty result sets by default. + +--- + +## Unconvertible Patterns (Generate Stubs) + +These cannot be automatically converted. Generate a stub with TODO comments: + +| Pattern | Why | Resolution | +|---|---|---| +| PERFORM | Calls function for side effects, discards result. No SQL equivalent. | Move logic to application code or AWS Lambda | +| Complex ELSIF (3+ branches with side effects) | Multiple DML statements in branches — too complex for CASE WHEN | Rewrite as multiple SQL functions or move to application | +| RAISE NOTICE/LOG | Diagnostic output — no SQL equivalent | Use application logging | +| Dynamic table/column names (complex) | Cannot expand all combinations | Move to application code | +| LOOP with EXIT WHEN | Iterative logic with break conditions | Rewrite as recursive CTE or application loop | + +**Stub template:** +```sql +-- TODO: Manual conversion required +-- Original function: +-- Pattern: PERFORM / Complex ELSIF / Dynamic SQL +-- Original body: +-- +-- +-- Suggested approach: +-- Move this logic to application code (Python/Node/Java) +-- or implement as an AWS Lambda function called via application layer. +``` + +--- + +## Conversion Workflow + +1. **Identify all PL/pgSQL functions:** `SELECT proname, prosrc FROM pg_proc WHERE prolang = (SELECT oid FROM pg_language WHERE lanname = 'plpgsql');` +2. **Identify all triggers:** `SELECT tgname, tgrelid::regclass, proname FROM pg_trigger JOIN pg_proc ON tgfoid = pg_proc.oid WHERE NOT tgisinternal;` +3. **Match each function to a pattern** (use detection signals above) +4. **Generate SQL replacement** using the templates +5. **Drop the trigger** (triggers are not supported in DSQL) +6. **Create the SQL function** via `transact` +7. **Update application code** to call the function where the trigger used to fire +8. **Run `dsql_lint`** on the generated SQL to verify compatibility + +--- + +## App Integration Cheat Sheet + +| Original Trigger Timing | Replacement Call Point | +|---|---| +| BEFORE INSERT | Call validation function before INSERT | +| BEFORE UPDATE | Call SET_COLUMN function after UPDATE (or inline in UPDATE SET) | +| AFTER INSERT | Call audit/notification function after INSERT | +| AFTER UPDATE | Call audit function after UPDATE | +| BEFORE DELETE | Call cascade function before DELETE | +| AFTER DELETE | Call audit function after DELETE | diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/schema-objects.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/schema-objects.md new file mode 100644 index 00000000..f19059b5 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/schema-objects.md @@ -0,0 +1,372 @@ +# Schema Object Conversion for DSQL + +Conversion patterns for PostgreSQL schema objects that `dsql_lint` either doesn't handle +or flags as unfixable. Covers ENUM types, materialized views, extensions, roles/grants, +multi-schema flattening, and other structural conversions. + +Sources: +- [Supported SQL Features](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-supported-sql-features.html) +- [Migration Guide](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-migration-guide.html) +- [Database Roles and IAM](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/using-database-and-iam-roles.html) + +--- + +## ENUM Types → CHECK Constraints + +PostgreSQL ENUM types are not supported in DSQL. Convert to varchar + CHECK constraint. + +**Before (PostgreSQL):** +```sql +CREATE TYPE ticket_status AS ENUM ('open', 'in_progress', 'resolved', 'closed'); +CREATE TYPE priority_level AS ENUM ('low', 'medium', 'high', 'critical'); + +CREATE TABLE tickets ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + status ticket_status NOT NULL DEFAULT 'open', + priority priority_level NOT NULL DEFAULT 'medium' +); +``` + +**After (DSQL):** +```sql +CREATE TABLE tickets ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + status varchar(20) COLLATE "C" NOT NULL DEFAULT 'open' + CHECK (status IN ('open', 'in_progress', 'resolved', 'closed')), + priority varchar(20) COLLATE "C" NOT NULL DEFAULT 'medium' + CHECK (priority IN ('low', 'medium', 'high', 'critical')) +); +``` + +**Important:** CHECK constraints MUST be defined at CREATE TABLE time in DSQL. You cannot +`ALTER TABLE ADD CHECK` after creation. + +**Conversion steps:** +1. Find all `CREATE TYPE ... AS ENUM` statements +2. Find all columns using those types +3. Replace the column type with `varchar(N) COLLATE "C"` where N fits the longest value +4. Add `CHECK (column IN ('val1', 'val2', ...))` inline in CREATE TABLE +5. Drop the `CREATE TYPE` statement entirely + +--- + +## Composite Types → JSON or Separate Columns + +```sql +-- PostgreSQL +CREATE TYPE address AS (street text, city text, state text, zip text); +CREATE TABLE customers (id uuid PRIMARY KEY, home_address address); + +-- DSQL Option 1: JSON column (flexible) +CREATE TABLE customers ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + home_address json -- {"street":"...","city":"...","state":"...","zip":"..."} +); +-- Query: SELECT home_address::jsonb->>'city' FROM customers; + +-- DSQL Option 2: Separate columns (indexable) +CREATE TABLE customers ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + home_street text COLLATE "C", + home_city text COLLATE "C", + home_state text COLLATE "C", + home_zip text COLLATE "C" +); +CREATE INDEX ASYNC idx_customers_city ON customers (home_city); +``` + +**Decision:** Use JSON if you rarely query individual fields. Use separate columns if you +need to index or filter on specific fields. + +--- + +## Materialized Views → Regular Views + +```sql +-- PostgreSQL +CREATE MATERIALIZED VIEW monthly_stats AS + SELECT date_trunc('month', created_at) AS month, COUNT(*) AS total + FROM orders GROUP BY 1; +-- Refreshed with: REFRESH MATERIALIZED VIEW monthly_stats; + +-- DSQL: Regular view (always up-to-date, no refresh needed) +CREATE VIEW monthly_stats AS + SELECT date_trunc('month', created_at) AS month, COUNT(*) AS total + FROM orders GROUP BY 1; +``` + +**Trade-off:** Regular views compute on every query (no caching). For expensive aggregations: +- Use application-layer caching (Redis, ElastiCache) +- Pre-compute into a summary table updated by application logic +- Accept the query cost if the dataset is small + +--- + +## Temporary Tables → Regular Tables or CTEs + +```sql +-- PostgreSQL +CREATE TEMP TABLE staging_data (id serial, payload json); +INSERT INTO staging_data SELECT ...; +-- Used within a session, auto-dropped on disconnect + +-- DSQL Option 1: CTE (for single-query use) +WITH staging_data AS ( + SELECT id, payload FROM source_table WHERE ... +) +SELECT * FROM staging_data WHERE ...; + +-- DSQL Option 2: Regular table with prefix (for multi-statement use) +CREATE TABLE _tmp_staging_data ( + id bigint GENERATED BY DEFAULT AS IDENTITY (CACHE 1), + session_id uuid NOT NULL, -- track which session owns the data + payload json +); +-- Clean up: DELETE FROM _tmp_staging_data WHERE session_id = $1; +``` + +--- + +## Partitioned Tables → Flat Tables + +```sql +-- PostgreSQL +CREATE TABLE events ( + id uuid, tenant_id uuid, created_at timestamptz, data json +) PARTITION BY RANGE (created_at); +CREATE TABLE events_2024_q1 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); + +-- DSQL: Flat table (DSQL handles distribution internally via PK-ordered storage) +CREATE TABLE events ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id uuid NOT NULL, + created_at timestamptz NOT NULL DEFAULT now(), + data json +); +CREATE INDEX ASYNC idx_events_tenant_created ON events (tenant_id, created_at DESC); +``` + +**Note:** DSQL's PK-ordered storage and distributed architecture handle data distribution +automatically. Manual partitioning is not needed and not supported. + +--- + +## Inherited Tables → Flat (Columns Merged) + +```sql +-- PostgreSQL +CREATE TABLE base_entity (id uuid PRIMARY KEY, created_at timestamptz, updated_at timestamptz); +CREATE TABLE users (email text, name text) INHERITS (base_entity); +CREATE TABLE products (sku text, price numeric) INHERITS (base_entity); + +-- DSQL: Merge inherited columns into each child table +CREATE TABLE users ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + created_at timestamptz DEFAULT now(), + updated_at timestamptz DEFAULT now(), + email text COLLATE "C", + name text COLLATE "C" +); + +CREATE TABLE products ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + created_at timestamptz DEFAULT now(), + updated_at timestamptz DEFAULT now(), + sku text COLLATE "C", + price numeric(10,2) +); +``` + +--- + +## Extensions → Alternatives + +| PostgreSQL Extension | DSQL Alternative | Notes | +|---|---|---| +| uuid-ossp | `gen_random_uuid()` | Built-in, no extension needed | +| pgcrypto | `gen_random_uuid()` | For other crypto, use application layer | +| pg_trgm | None | Use OpenSearch for fuzzy search | +| postgis | None | Store coords as numeric columns or geohash text | +| hstore | `json` type | Use json column instead | +| citext | `varchar` + `lower()` | Case-insensitive via application queries | +| pg_stat_statements | None | DSQL has own monitoring | +| btree_gin / btree_gist | None | Use btree indexes directly | +| tablefunc (crosstab) | None | Pivot in application layer | +| ltree | `text` + application logic | Hierarchical queries in app | + +**Conversion:** +```sql +-- PostgreSQL +CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; +SELECT uuid_generate_v4(); + +-- DSQL: Remove extension, replace function +-- DROP the CREATE EXTENSION statement +SELECT gen_random_uuid(); -- built-in replacement +``` + +--- + +## Roles/GRANT → IAM Mapping + +DSQL supports `CREATE ROLE` and `GRANT/REVOKE` but they're linked to IAM. + +```sql +-- PostgreSQL +CREATE ROLE app_reader WITH LOGIN PASSWORD 'secret'; +GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_reader; + +-- DSQL: Role creation works, but auth is IAM-based (no passwords) +CREATE ROLE app_reader; +GRANT USAGE ON SCHEMA public TO app_reader; +GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_reader; +-- Authentication: IAM role mapped to database role via dsql:DbConnect policy +``` + +**Key differences:** +- No `WITH LOGIN PASSWORD` — authentication is always IAM token-based +- No `ALTER DEFAULT PRIVILEGES` — use explicit GRANT per object +- No Row-Level Security (RLS) — implement in application layer +- No `SECURITY DEFINER` functions — remove from function definitions +- Admin role is predefined and cannot be modified + +**IAM mapping:** +```json +{ + "Effect": "Allow", + "Action": "dsql:DbConnect", + "Resource": "arn:aws:dsql:us-east-1:123456789012:cluster/cluster-id", + "Condition": { + "StringEquals": { + "dsql:DbUser": "app_reader" + } + } +} +``` + +--- + +## Multi-Schema Handling + +DSQL supports up to 10 schemas per database. + +### ≤10 Schemas: Direct Migration + +```sql +-- PostgreSQL schemas migrate directly +CREATE SCHEMA billing; +GRANT USAGE ON SCHEMA billing TO app_role; +CREATE TABLE billing.invoices (id uuid PRIMARY KEY, amount numeric(10,2)); + +CREATE SCHEMA support; +GRANT USAGE ON SCHEMA support TO app_role; +CREATE TABLE support.tickets (id uuid PRIMARY KEY, title text COLLATE "C"); +``` + +### >10 Schemas: Consolidate with Prefixes + +```sql +-- PostgreSQL has 15 schemas — must consolidate to ≤10 +-- Strategy: merge least-used schemas into 'public' with table name prefixes + +-- Schema 'analytics' (overflow) → prefix tables +CREATE TABLE public.analytics_reports (id uuid PRIMARY KEY, ...); +CREATE TABLE public.analytics_dashboards (id uuid PRIMARY KEY, ...); + +-- Update all references in application code: +-- FROM: analytics.reports → TO: public.analytics_reports +``` + +### search_path Behavior + +```sql +-- DSQL supports search_path +SET search_path TO billing, public; +SELECT * FROM invoices; -- resolves to billing.invoices + +-- NOTE: After schema DDL, refresh connection for immediate visibility +``` + +--- + +## UNLOGGED Tables → Regular Tables + +```sql +-- PostgreSQL: UNLOGGED for performance (data lost on crash) +CREATE UNLOGGED TABLE session_cache (key text PRIMARY KEY, value json); + +-- DSQL: All tables are durable. Remove UNLOGGED keyword. +CREATE TABLE session_cache ( + key text COLLATE "C" PRIMARY KEY, + value json +); +-- If you need non-durable caching, use ElastiCache/Redis instead. +``` + +--- + +## CREATE DOMAIN → Preserved + +DSQL supports CREATE DOMAIN: + +```sql +-- PostgreSQL +CREATE DOMAIN email_address AS varchar(255) CHECK (VALUE ~ '^[^@]+@[^@]+\.[^@]+$'); + +-- DSQL: Works as-is (DOMAIN is supported) +CREATE DOMAIN email_address AS varchar(255) COLLATE "C" + CHECK (VALUE ~ '^[^@]+@[^@]+\.[^@]+$'); +``` + +--- + +## GENERATED ALWAYS AS STORED → Preserved + +DSQL supports computed columns: + +```sql +-- PostgreSQL +CREATE TABLE products ( + price numeric(10,2), + tax_rate numeric(4,2), + total numeric(10,2) GENERATED ALWAYS AS (price * (1 + tax_rate)) STORED +); + +-- DSQL: Works as-is +CREATE TABLE products ( + price numeric(10,2), + tax_rate numeric(4,2), + total numeric(10,2) GENERATED ALWAYS AS (price * (1 + tax_rate)) STORED +); +``` + +--- + +## WITH (storage parameters) → Removed + +```sql +-- PostgreSQL +CREATE TABLE hot_data (id uuid PRIMARY KEY, data json) WITH (fillfactor = 70); +ALTER TABLE hot_data SET (autovacuum_vacuum_threshold = 100); + +-- DSQL: Remove all storage parameters. DSQL manages storage automatically. +CREATE TABLE hot_data (id uuid PRIMARY KEY DEFAULT gen_random_uuid(), data json); +-- No VACUUM needed — DSQL handles automatically. +``` + +--- + +## Conversion Checklist + +- [ ] Find all `CREATE TYPE ... AS ENUM` → convert to CHECK constraints +- [ ] Find all `CREATE TYPE ... AS (composite)` → convert to json or separate columns +- [ ] Find all `CREATE MATERIALIZED VIEW` → convert to regular VIEW +- [ ] Find all `CREATE TEMP TABLE` → convert to CTE or regular table with _tmp_ prefix +- [ ] Find all `PARTITION BY` → remove (DSQL handles distribution) +- [ ] Find all `INHERITS` → merge columns into child tables +- [ ] Find all `CREATE EXTENSION` → remove and use alternatives +- [ ] Find all `UNLOGGED` → remove keyword +- [ ] Find all `WITH (fillfactor=...)` → remove storage parameters +- [ ] Audit roles/grants → remove passwords, map to IAM +- [ ] Count schemas → consolidate if >10 +- [ ] Add `COLLATE "C"` to all string columns diff --git a/plugins/databases-on-aws/skills/dsql/references/pg-migrations/type-mapping.md b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/type-mapping.md new file mode 100644 index 00000000..d02c9d38 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/pg-migrations/type-mapping.md @@ -0,0 +1,197 @@ +# PostgreSQL → DSQL Type Mapping + +Complete type conversion reference for migrating PostgreSQL schemas to Aurora DSQL. +Complements `dsql_lint` — the linter handles SERIAL/JSON/array detection; this file provides +the full mapping table, COLLATE rules, and migration decision guidance. + +Sources: +- [DSQL Supported Data Types](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/working-with-postgresql-compatibility-supported-data-types.html) +- [PostgreSQL 16 Data Types](https://www.postgresql.org/docs/16/datatype.html) +- [DSQL SQL Dialect Blog](https://aws.amazon.com/blogs/database/dsql-sql-dialect-how-amazon-aurora-dsql-differs-from-single-instance-postgresql/) + +--- + +## Numeric Types + +| PostgreSQL Type | Aliases | DSQL Type | Indexable | Notes | +|---|---|---|---|---| +| SMALLINT | INT2 | smallint | ✅ | Direct | +| INTEGER | INT, INT4 | integer | ✅ | Direct | +| BIGINT | INT8 | bigint | ✅ | Direct | +| REAL | FLOAT4 | real | ✅ | 6 decimal digits | +| DOUBLE PRECISION | FLOAT8 | double precision | ✅ | 15 decimal digits | +| FLOAT(1-24) | — | real | ✅ | Maps to 4-byte | +| FLOAT(25-53) | — | double precision | ✅ | Maps to 8-byte | +| FLOAT (no precision) | — | double precision | ✅ | Default 8-byte | +| NUMERIC(p,s) | DECIMAL(p,s) | numeric(p,s) | ✅ | Max precision 38, scale 37 | +| NUMERIC (no precision) | DECIMAL | numeric | ✅ | ⚠️ DSQL defaults to (18,6) — specify explicitly | +| MONEY | — | numeric(19,4) | ✅ | Converted to numeric | +| SERIAL | SERIAL4 | integer | ✅ | `dsql_lint` handles → IDENTITY | +| BIGSERIAL | SERIAL8 | bigint | ✅ | `dsql_lint` handles → IDENTITY | +| SMALLSERIAL | SERIAL2 | smallint | ✅ | `dsql_lint` handles → IDENTITY | + +## String Types + +**CRITICAL:** DSQL uses C collation exclusively. All string columns MUST have `COLLATE "C"`. + +| PostgreSQL Type | Aliases | DSQL Type | Indexable | Max Size | +|---|---|---|---|---| +| CHAR(n) | CHARACTER(n) | char(n) COLLATE "C" | ✅ | 4,096 bytes | +| VARCHAR(n) | CHARACTER VARYING(n) | varchar(n) COLLATE "C" | ✅ | 65,535 bytes | +| VARCHAR (no length) | CHARACTER VARYING | varchar COLLATE "C" | ✅ | 65,535 bytes | +| BPCHAR | — | bpchar COLLATE "C" | ✅ | 4,096 bytes | +| TEXT | — | text COLLATE "C" | ✅ | 1 MiB | + +### COLLATE "C" Impact + +```sql +-- PostgreSQL (en_US.UTF-8): apple, Banana, cherry +-- DSQL (C collation): Banana, apple, cherry (uppercase before lowercase) +SELECT name FROM items ORDER BY name; + +-- Workaround for case-insensitive sort: +SELECT name FROM items ORDER BY lower(name); + +-- LIKE patterns work correctly for ASCII +-- Non-ASCII: ä sorts after z (not after a) +``` + +**Migration action:** Add `COLLATE "C"` to all string column definitions. Warn users that ORDER BY behavior will change for mixed-case or non-ASCII data. + +## Date/Time Types + +| PostgreSQL Type | Aliases | DSQL Type | Indexable | +|---|---|---|---| +| DATE | — | date | ✅ | +| TIME(p) | TIME WITHOUT TIME ZONE | time(p) | ✅ | +| TIMETZ(p) | TIME WITH TIME ZONE | time(p) with time zone | ❌ | +| TIMESTAMP(p) | TIMESTAMP WITHOUT TIME ZONE | timestamp(p) | ✅ | +| TIMESTAMPTZ(p) | TIMESTAMP WITH TIME ZONE | timestamptz(p) | ✅ | +| INTERVAL(p) | — | interval(p) | ❌ | + +## Boolean, Binary, UUID + +| PostgreSQL Type | DSQL Type | Indexable | +|---|---|---| +| BOOLEAN / BOOL | boolean | ✅ | +| BYTEA | bytea | ❌ | +| UUID | uuid | ✅ | + +## JSON Types + +| PostgreSQL Type | DSQL Type | Indexable | Notes | +|---|---|---|---| +| JSON | json | ❌ | Native support, all operators work | +| JSONB | json | ❌ | Stored as json; cast `::jsonb` in queries for operators | + +```sql +-- Store as json, query with jsonb operators +CREATE TABLE config (id uuid PRIMARY KEY, data json); +SELECT data::jsonb -> 'key' FROM config; -- ✅ extract +SELECT data::jsonb @> '{"a":1}' FROM config; -- ✅ containment +SELECT data::jsonb ? 'key' FROM config; -- ✅ key exists +``` + +## Types Mapped to TEXT (no native DSQL equivalent) + +| PostgreSQL Type | DSQL Type | Reason | +|---|---|---| +| TEXT[] / INTEGER[] / UUID[] / *any*[] | text COLLATE "C" | Arrays are runtime-only | +| INET | text COLLATE "C" | Runtime-only (cast `::inet` in queries) | +| CIDR | text COLLATE "C" | No native equivalent | +| MACADDR / MACADDR8 | text COLLATE "C" | No native equivalent | +| TSVECTOR | text COLLATE "C" | No full-text search | +| TSQUERY | text COLLATE "C" | No full-text search | +| XML | text COLLATE "C" | No native XML | +| BIT(n) / VARBIT(n) | text COLLATE "C" | No native bit string | +| POINT / LINE / LSEG / BOX / PATH / POLYGON / CIRCLE | text COLLATE "C" | No geometric types | +| OID / REGCLASS / REGTYPE / PG_LSN | text COLLATE "C" | System types | + +### Array Storage Alternatives + +```sql +-- Option 1: Store as json array (preferred for structured data) +CREATE TABLE projects (id uuid PRIMARY KEY, tags json); +INSERT INTO projects VALUES (gen_random_uuid(), '["backend","api","database"]'); +SELECT json_array_elements_text(tags) FROM projects; -- unnest + +-- Option 2: Store as comma-separated text (simple cases) +CREATE TABLE projects (id uuid PRIMARY KEY, tags text COLLATE "C"); +INSERT INTO projects VALUES (gen_random_uuid(), 'backend,api,database'); +SELECT string_to_array(tags, ',') FROM projects; -- runtime array +``` + +--- + +## NUMERIC Precision Warning + +PostgreSQL allows unbounded NUMERIC. DSQL enforces max precision 38, scale 37: + +```sql +-- PostgreSQL: stores any precision +CREATE TABLE t (val NUMERIC); +INSERT INTO t VALUES (12345678901234567890.1234567890); -- works + +-- DSQL: if no precision specified, stores up to (38,37) +-- Migration action: always specify explicit (p,s) for NUMERIC columns +CREATE TABLE t (val NUMERIC(20,10)); -- explicit is safer +``` + +--- + +## Migration Decision Matrix + +| PostgreSQL Column Type | DSQL Column Type | Data Loss Risk | Action Required | +|---|---|---|---| +| SMALLINT/INTEGER/BIGINT | Same | None | None | +| REAL/DOUBLE PRECISION | Same | None | None | +| NUMERIC(p,s) where p≤38 | numeric(p,s) | None | None | +| NUMERIC (unbounded) | numeric(18,6) default | Possible truncation | Add explicit (p,s) | +| SERIAL/BIGSERIAL | integer/bigint + IDENTITY | None | `dsql_lint` handles | +| CHAR/VARCHAR/TEXT | Same + COLLATE "C" | None (data) | Sort order changes | +| DATE/TIME/TIMESTAMP/TIMESTAMPTZ | Same | None | None | +| INTERVAL | interval | None | Cannot index | +| BOOLEAN | boolean | None | None | +| BYTEA | bytea | None | Cannot index | +| UUID | uuid | None | None | +| JSON | json | None | None | +| JSONB | json | None (operators work via cast) | Use `::jsonb` in queries | +| TEXT[] / INT[] | text | Semantic loss | Use json array or join table | +| INET/CIDR/MACADDR | text | None (string repr) | Cast `::inet` at query time | +| TSVECTOR/TSQUERY | text | FTS capability lost | Use OpenSearch/Elasticsearch | +| Geometric types | text | Spatial capability lost | Use external GIS | +| XML | text | XML functions lost | Parse in application | +| MONEY | numeric(19,4) | None | Explicit precision | +| BIT/VARBIT | text | Bit operations lost | Use application logic | + +--- + +## Quick Conversion Template + +```sql +-- Before (PostgreSQL) +CREATE TABLE users ( + id SERIAL PRIMARY KEY, + email VARCHAR(255) NOT NULL, + name TEXT, + balance MONEY, + preferences JSONB, + tags TEXT[], + ip_address INET, + search_vector TSVECTOR, + created_at TIMESTAMPTZ DEFAULT now() +); + +-- After (DSQL) +CREATE TABLE users ( + id bigint GENERATED BY DEFAULT AS IDENTITY (CACHE 1) PRIMARY KEY, + email varchar(255) COLLATE "C" NOT NULL, + name text COLLATE "C", + balance numeric(19,4), + preferences json, -- query with ::jsonb operators + tags json, -- store as '["tag1","tag2"]' + ip_address text COLLATE "C", -- cast ::inet at query time + search_vector text COLLATE "C", -- use OpenSearch for FTS + created_at timestamptz DEFAULT now() +); +``` diff --git a/tools/evals/databases-on-aws/dsql/pg_migration_evals.json b/tools/evals/databases-on-aws/dsql/pg_migration_evals.json new file mode 100644 index 00000000..c74e2ed6 --- /dev/null +++ b/tools/evals/databases-on-aws/dsql/pg_migration_evals.json @@ -0,0 +1,206 @@ +{ + "skill_name": "dsql", + "focus": "PostgreSQL schema conversion — does the agent use skill knowledge to handle conversions beyond dsql_lint?", + "evals": [ + { + "id": 200, + "name": "enum-to-check", + "prompt": "Convert this PostgreSQL schema to DSQL:\n\nCREATE TYPE priority_level AS ENUM ('low', 'medium', 'high', 'critical');\n\nCREATE TABLE tickets (\n id UUID DEFAULT gen_random_uuid() PRIMARY KEY,\n title TEXT NOT NULL,\n priority priority_level DEFAULT 'medium'\n);", + "expected_output": "Converts ENUM type to CHECK constraint inline in CREATE TABLE, drops the CREATE TYPE statement", + "files": [], + "expectations": [ + "Drops the CREATE TYPE ... AS ENUM statement", + "Converts the priority column to varchar or text with a CHECK constraint listing the enum values", + "CHECK constraint contains all original enum values: low, medium, high, critical", + "Adds COLLATE \"C\" to string columns (title and priority)", + "Preserves the DEFAULT 'medium'" + ], + "llm_judge": true + }, + { + "id": 201, + "name": "plpgsql-trigger-conversion", + "prompt": "I need to migrate this PostgreSQL trigger to DSQL:\n\nCREATE FUNCTION set_updated_at() RETURNS TRIGGER AS $$\nBEGIN\n NEW.updated_at = now();\n RETURN NEW;\nEND;\n$$ LANGUAGE plpgsql;\n\nCREATE TRIGGER trg_users_updated BEFORE UPDATE ON users\n FOR EACH ROW EXECUTE FUNCTION set_updated_at();", + "expected_output": "Converts PL/pgSQL trigger to a SQL function that the application calls after UPDATE, or suggests inlining updated_at = now() in the UPDATE statement", + "files": [], + "expectations": [ + "Identifies this as a SET_COLUMN pattern (Pattern 1)", + "Generates a replacement SQL function using LANGUAGE sql (not plpgsql)", + "Drops the CREATE TRIGGER statement", + "Explains that the application must call the function after UPDATE (or inline the logic)", + "Does NOT claim triggers work in DSQL" + ], + "llm_judge": true + }, + { + "id": 202, + "name": "fk-validation-generation", + "prompt": "My PostgreSQL schema has:\n\nCREATE TABLE orders (\n id UUID PRIMARY KEY,\n customer_id UUID REFERENCES customers(id) ON DELETE CASCADE,\n product_id UUID REFERENCES products(id)\n);\n\nConvert this for DSQL and generate the FK replacement code.", + "expected_output": "Removes FK constraints, generates validate_fk functions for both FKs, generates a cascade_delete function for the ON DELETE CASCADE", + "files": [], + "expectations": [ + "Removes both FOREIGN KEY constraints from the CREATE TABLE", + "Generates a validate_fk_orders_customer_id() function that checks customers table", + "Generates a validate_fk_orders_product_id() function that checks products table", + "Generates a cascade function for the ON DELETE CASCADE (deletes orders when customer is deleted)", + "Explains when to call each function (validate before INSERT, cascade before DELETE)" + ], + "llm_judge": true + }, + { + "id": 203, + "name": "gin-index-conversion", + "prompt": "How do I convert this PostgreSQL GIN index to work in DSQL?\n\nCREATE INDEX idx_users_prefs ON users USING gin (preferences jsonb_path_ops);\n\nThe index is used for queries like: SELECT * FROM users WHERE preferences @> '{\"theme\":\"dark\"}'", + "expected_output": "Explains GIN is not supported, suggests extracting the queried JSON key to a computed column with a btree index, or accepting unindexed runtime jsonb queries", + "files": [], + "expectations": [ + "States that GIN indexes are not supported in DSQL", + "Does NOT just say 'use btree' without explaining the semantic difference", + "Suggests extracting frequently-queried JSON keys to a separate column (computed or regular)", + "Shows CREATE INDEX ASYNC with btree on the extracted column", + "Mentions that jsonb containment (@>) still works at runtime without an index" + ], + "llm_judge": true + }, + { + "id": 204, + "name": "occ-retry-generation", + "prompt": "Generate OCC retry logic for my Python application connecting to DSQL. I'm using psycopg2.", + "expected_output": "Generates a retry wrapper function with exponential backoff that catches SQLSTATE 40001", + "files": [], + "expectations": [ + "Generates a Python function with retry logic", + "Catches psycopg2 SerializationFailure or checks for SQLSTATE/pgcode 40001", + "Uses exponential backoff with jitter", + "Has a max retry limit (typically 5)", + "Includes BEGIN/COMMIT/ROLLBACK transaction management" + ], + "llm_judge": true + }, + { + "id": 205, + "name": "uuid-generate-replacement", + "prompt": "My PostgreSQL schema uses uuid_generate_v4() and lastval() extensively. What do I need to change for DSQL?\n\nCREATE EXTENSION IF NOT EXISTS \"uuid-ossp\";\nCREATE TABLE events (id UUID DEFAULT uuid_generate_v4() PRIMARY KEY);\n-- Later in app code: SELECT lastval();", + "expected_output": "Replaces uuid_generate_v4() with gen_random_uuid(), removes CREATE EXTENSION, replaces lastval() with currval('explicit_sequence_name')", + "files": [], + "expectations": [ + "Removes the CREATE EXTENSION statement", + "Replaces uuid_generate_v4() with gen_random_uuid()", + "Explains that gen_random_uuid() is built-in (no extension needed)", + "Replaces lastval() with currval() using an explicit sequence name", + "Explains why lastval() is not supported (must reference specific sequence)" + ], + "llm_judge": true + }, + { + "id": 206, + "name": "django-migration-guidance", + "prompt": "I'm migrating my Django app to DSQL. What adapter do I use and what model changes are needed? I currently have ForeignKey fields and ArrayField.", + "expected_output": "Recommends aurora-dsql-django adapter, explains ForeignKey replacement with plain fields + validation, ArrayField replacement with JSONField", + "files": [], + "expectations": [ + "Recommends aurora-dsql-django as the database ENGINE", + "Explains that ForeignKey fields must be replaced with BigIntegerField or UUIDField", + "Suggests adding clean() or signal-based FK validation", + "Explains that ArrayField is not supported and should use JSONField", + "Mentions OCC retry logic is needed (SQLSTATE 40001)" + ], + "llm_judge": true + }, + { + "id": 208, + "name": "expression-index-to-computed-column", + "prompt": "I have this PostgreSQL expression index:\n\nCREATE INDEX idx_users_email_lower ON users (lower(email));\n\nIt's used for case-insensitive email lookups: SELECT * FROM users WHERE lower(email) = lower('User@Example.com');\n\nHow do I convert this for DSQL?", + "expected_output": "Creates a GENERATED ALWAYS AS STORED computed column with lower(email), then creates a btree ASYNC index on that column", + "files": [], + "expectations": [ + "States that expression indexes are not supported in DSQL", + "Suggests creating a computed column using GENERATED ALWAYS AS (lower(email)) STORED", + "Creates a btree index with CREATE INDEX ASYNC on the computed column", + "Shows how to rewrite the query to use the new computed column", + "Does NOT suggest just removing the index without a replacement" + ], + "llm_judge": true + }, + { + "id": 209, + "name": "materialized-view-to-regular-view", + "prompt": "I have this PostgreSQL materialized view that I refresh every hour:\n\nCREATE MATERIALIZED VIEW monthly_revenue AS\n SELECT date_trunc('month', created_at) AS month,\n tenant_id,\n SUM(amount) AS total_revenue,\n COUNT(*) AS order_count\n FROM orders\n GROUP BY 1, 2;\n\nCREATE UNIQUE INDEX idx_monthly_rev ON monthly_revenue (month, tenant_id);\n\nHow do I migrate this to DSQL?", + "expected_output": "Converts to a regular VIEW (no materialization), removes REFRESH, suggests application-layer caching if performance is a concern", + "files": [], + "expectations": [ + "Converts CREATE MATERIALIZED VIEW to CREATE VIEW (regular view)", + "Explains that materialized views are not supported in DSQL", + "Removes or addresses the UNIQUE INDEX on the materialized view (indexes on views not applicable)", + "Suggests application-layer caching (Redis/ElastiCache) or a summary table if query performance is a concern", + "Does NOT suggest REFRESH MATERIALIZED VIEW (not applicable to regular views)" + ], + "llm_judge": true + }, + { + "id": 210, + "name": "multi-schema-flattening", + "prompt": "My PostgreSQL database has 14 schemas:\n- public, billing, support, analytics, reporting, notifications, auth, payments, inventory, shipping, marketing, hr, compliance, audit\n\nEach schema has 20-50 tables. How do I handle this in DSQL which only supports 10 schemas?", + "expected_output": "Explains the 10 schema limit, recommends keeping the most important schemas and consolidating overflow schemas into existing ones with table name prefixes", + "files": [], + "expectations": [ + "States that DSQL has a limit of 10 schemas per database", + "Recommends keeping the most critical/largest schemas as-is", + "Suggests consolidating overflow schemas using table name prefixes (e.g., hr_employees instead of hr.employees)", + "Mentions that application code references need to be updated for consolidated tables", + "Does NOT suggest creating multiple DSQL clusters as the primary solution (though may mention it as an alternative)" + ], + "llm_judge": true + }, + { + "id": 211, + "name": "roles-grant-iam-mapping", + "prompt": "My PostgreSQL database has these roles and grants:\n\nCREATE ROLE app_readonly WITH LOGIN PASSWORD 'secret123';\nCREATE ROLE app_writer WITH LOGIN PASSWORD 'writer456';\nGRANT SELECT ON ALL TABLES IN SCHEMA public TO app_readonly;\nGRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_writer;\n\nHow do I migrate this to DSQL?", + "expected_output": "Removes passwords (IAM-based auth), keeps CREATE ROLE and GRANT statements, explains IAM role mapping with dsql:DbConnect", + "files": [], + "expectations": [ + "Removes WITH LOGIN PASSWORD from CREATE ROLE (DSQL uses IAM tokens, not passwords)", + "Preserves the CREATE ROLE statements (roles are supported in DSQL)", + "Preserves the GRANT statements (GRANT/REVOKE are supported)", + "Explains that authentication uses IAM tokens (not passwords)", + "Mentions dsql:DbConnect IAM action or IAM-to-database-role mapping" + ], + "llm_judge": true + }, + { + "id": 212, + "name": "copy-to-batched-insert", + "prompt": "I currently load data into PostgreSQL using COPY:\n\nCOPY users (id, email, name, created_at) FROM '/data/users.csv' WITH (FORMAT csv, HEADER);\n\nThe file has about 50,000 rows. How do I do this in DSQL?", + "expected_output": "Explains COPY is not supported, recommends batched INSERT with 500-1000 rows per transaction, provides a script or pattern for loading CSV data", + "files": [], + "expectations": [ + "States that COPY command is not supported in DSQL", + "Recommends batched INSERT statements as the replacement", + "Specifies a batch size under 3,000 rows (typically 500-1000)", + "Mentions that each batch should be its own transaction", + "Provides or describes a script/pattern for reading CSV and inserting in batches" + ], + "llm_judge": true + }, + { + "id": 207, + "name": "full-schema-conversion", + "prompt": "Convert this complete PostgreSQL schema to DSQL-compatible DDL:\n\nCREATE TYPE status AS ENUM ('active','inactive','suspended');\n\nCREATE TABLE organizations (\n id SERIAL PRIMARY KEY,\n name VARCHAR(200) UNIQUE\n);\n\nCREATE TABLE users (\n id SERIAL PRIMARY KEY,\n org_id INT REFERENCES organizations(id) ON DELETE CASCADE,\n email VARCHAR(255) NOT NULL,\n status status DEFAULT 'active',\n preferences JSONB,\n search_tokens TSVECTOR\n);\n\nCREATE INDEX idx_users_search ON users USING gin(search_tokens);\nCREATE INDEX idx_users_active ON users (email) WHERE status = 'active';\n\nCREATE FUNCTION update_search() RETURNS TRIGGER AS $$\nBEGIN\n NEW.search_tokens = to_tsvector(NEW.email);\n RETURN NEW;\nEND;\n$$ LANGUAGE plpgsql;", + "expected_output": "Comprehensive conversion: SERIAL→IDENTITY, ENUM→CHECK, FK→removed+validation function, JSONB→json, TSVECTOR→text, GIN→removed, partial index→full or composite, PL/pgSQL→removed with guidance", + "files": [], + "expectations": [ + "Converts SERIAL to IDENTITY or sequence-based approach", + "Converts ENUM to CHECK constraint", + "Removes the FOREIGN KEY and generates a validate_fk function or cascade function", + "Converts JSONB to json with cast guidance", + "Converts TSVECTOR to text and notes FTS must move to external service", + "Removes or converts the GIN index (not supported)", + "Handles the partial index (WHERE status = 'active') — removes WHERE or adds status to composite index", + "Converts the PL/pgSQL trigger function (or explains it must move to app layer)", + "Adds COLLATE \"C\" to string columns", + "Uses CREATE INDEX ASYNC for all indexes" + ], + "llm_judge": true + } + ] +}