A comprehensive Apache Flink CDC (Change Data Capture) environment with SQL Server and PostgreSQL integration using Docker Compose.
Change Data Capture (CDC) is a technology that monitors and captures data changes in a database as they occur. Instead of periodically checking for changes, CDC provides real-time notifications when data is inserted, updated, or deleted.
Apache Flink is a stream processing framework that processes data in real-time as it flows through the system. It can handle millions of events per second with low latency and high reliability.
Together: CDC captures database changes instantly, and Flink processes these changes in real-time to transform and route data to other systems.
This project demonstrates real-time data streaming and CDC capabilities using:
- Apache Flink 1.20.0 - Stream processing engine
- SQL Server 2022 - Source database with CDC enabled
- PostgreSQL 15 - Target database for processed data
- CloudBeaver - Web-based database administration tool
- JobManager: Coordinates job execution and resource management
- TaskManager1 & TaskManager2: Execute streaming tasks (16 slots each)
- Job Submitter: Automatically uploads and runs JAR files
- SQL Server: Source database with CDC enabled for
customersandsellerstables - PostgreSQL: Target database for transformed data storage
- Data Generator Job (
mssql_job): Generates synthetic data for testing - CDC Processing Job (
cdc_job): Processes CDC events and loads to PostgreSQL
- Docker and Docker Compose
- Java 11+ (for building JAR files)
- Maven 3.6+ (for building projects)
- Clone and navigate to project:
git clone <repository-url>
cd cdc_docker- Start the complete environment:
docker compose upThat's it! The Docker Compose configuration includes everything needed:
- Pre-built Flink jobs are available in
flink_jobs_jars/directory - All services (Flink cluster, SQL Server, PostgreSQL, CloudBeaver) start automatically
- Flink jobs are automatically submitted and executed
- The complete MSSQL → PostgreSQL data integration pipeline starts immediately
- Verify services:
- Flink Web UI: http://localhost:8081

- CloudBeaver: http://localhost:8978
Optional - Rebuild jobs (only if you modify the source code):
# Build data generator
cd mssql_job
mvn clean package
cp target/mssql-data-generator-1.0-SNAPSHOT.jar ../flink_jobs_jars/
# Build CDC job
cd ../cdc_job
mvn clean package
cp target/cdc_job-1.jar ../flink_jobs_jars/cdc_job-1.jar
cd ..The environment automatically:
- Enables CDC on the database
- Starts SQL Server Agent (required for CDC)
- Creates
customersandsellerstables - Configures CDC for both tables
- Sets up CDC capture jobs for real-time change tracking
- Creates target tables with CDC metadata columns
- Configures connection for Flink jobs
The system automatically downloads required JAR files:
- PostgreSQL JDBC Driver
- SQL Server JDBC Driver
- Flink SQL Gateway API
- Flink Dashboard: Monitor job status, metrics, and logs
- CloudBeaver: Query both source and target databases
- Docker Logs: View container logs for troubleshooting
- Data Generator creates INSERT/UPDATE/DELETE operations in SQL Server
- CDC captures changes in real-time
- Flink processes and transforms the data
- Processed data is loaded into PostgreSQL with operation metadata
Check CDC data in PostgreSQL:
-- View customer changes
SELECT * FROM customers ORDER BY cdc_timestamp DESC LIMIT 10;
-- View seller changes
SELECT * FROM sellers ORDER BY cdc_timestamp DESC LIMIT 10;
-- Count operations by type
SELECT operation, COUNT(*) FROM customers GROUP BY operation;- Port conflicts: Ensure ports 8081, 1433, 5432, 8978 are available
- Memory issues: Increase Docker memory allocation if needed
- Job failures: Check Flink logs in the web UI
# View specific service logs
docker compose logs jobmanager
docker compose logs taskmanager1
docker compose logs mssql
docker compose logs postgresql
# Follow logs in real-time
docker compose logs -f flink-job-submitter# Restart specific service
docker compose restart jobmanager
# Restart all services
docker compose down && docker compose up -d
# Clean restart (removes volumes)
docker compose down -v && docker compose up -d- Create JAR file and place in
flink_jobs_jars/ - Restart
flink-job-submitterservice - Monitor job execution in Flink UI
- Update
init.sqlfor SQL Server changes - Update
init_postgres.sqlfor PostgreSQL changes - Rebuild and restart containers
- TaskManager slots: 16 per instance (32 total)
- Parallelism: Configurable per job
- Checkpointing: Disabled by default
- SQL Server: CDC cleanup configured
- PostgreSQL: Indexes on frequently queried columns

