This repository contains infrastructure as code to setup a database migration service, loading full tables from an Oracle RDS database and sinking them in S3 in Parquet format. The ongoing changes after the full load are captured as well and sinked. This demo illustrates the use of CDK and setting up an Oracle database for DMS. For application in a production environment where performance, security and monitoring are key, see the DMS Best Practices white paper and adjust accordingly.
The database stack is deployed using CDK and will require a VPC id. To deploy use
STACK=OracleStack make deploySet values for the environment variables CDK_DEPLOY_ACCOUNT, CDK_DEPLOY_REGION and VPC_ID.
In addition to the database, ingress rules are applied for your personal IP and for a security group used later to run the DMS replicating instance in. If you want to run the database in private subnets, you can do so here.
We provide some migrations to populate the database. Retrieve the secret accompanying the database, created in the previous step, in AWS Secrets Manager. Reuse the deployment region.
CDK_DEPLOY_REGION=... SECRET_ID=... make db-upgradeYou can now double check the presence of the users table in the ADMIN schema.
USER=admin HOST=... make loginThe HOST environment variable is the endpoint address of the Oracle RDS instance. Or if you deployed in private subnets and have a bastion server available, use a SSH tunnel.
Follow the steps to enable supplemental logging and create the directories for the Binary Reader as explained here. In addition, create the DMS user with the necessary privileges.
Now deploy the DMS stack using
STACK=ReplicationStack make deploySet values for the environment variables CDK_DEPLOY_ACCOUNT, CDK_DEPLOY_REGION and VPC_ID. In addition, use the values you set in the previous step for DMS_USER and DMS_PASSWORD.
If you wish to deploy in private subnets, change so here.
If everything is deployed fine, the connection tests for all the endpoint will pass and you can kick off the database migration task in DMS. The Parquet files will appear in S3, of which the content and types can be checked with a tool such as parq. Running a Glue crawler on the bucket will populate a Glue database with the tables where you can check the data types.
Inserting data into the existing tables will be captured by the CDC process of DMS. Unlike the full load files, these will include an additional column called Op which indicates whether the record is the result of an insert (I), update (U) or delete (D). A Glue database and crawler are made available.