Name	Name	Last commit message	Last commit date
parent directory ..
cdc_job	cdc_job
cloudbeaver_data	cloudbeaver_data
flink_jobs_jars	flink_jobs_jars
images	images
mssql_job	mssql_job
README.md	README.md
README_TR.md	README_TR.md
check_cdc.sql	check_cdc.sql
docker-compose.yml	docker-compose.yml
init.sql	init.sql
init_postgres.sql	init_postgres.sql
jobmanager.sh	jobmanager.sh
start_agent.sh	start_agent.sh
taskmanager.sh	taskmanager.sh

Flink CDC Docker Environment

A comprehensive Apache Flink CDC (Change Data Capture) environment with SQL Server and PostgreSQL integration using Docker Compose.

What is CDC and Flink?

Change Data Capture (CDC) is a technology that monitors and captures data changes in a database as they occur. Instead of periodically checking for changes, CDC provides real-time notifications when data is inserted, updated, or deleted.

Apache Flink is a stream processing framework that processes data in real-time as it flows through the system. It can handle millions of events per second with low latency and high reliability.

Together: CDC captures database changes instantly, and Flink processes these changes in real-time to transform and route data to other systems.

Architecture

This project demonstrates real-time data streaming and CDC capabilities using:

Apache Flink 1.20.0 - Stream processing engine
SQL Server 2022 - Source database with CDC enabled
PostgreSQL 15 - Target database for processed data
CloudBeaver - Web-based database administration tool

Components

Flink Cluster

JobManager: Coordinates job execution and resource management
TaskManager1 & TaskManager2: Execute streaming tasks (16 slots each)
Job Submitter: Automatically uploads and runs JAR files

Databases

SQL Server: Source database with CDC enabled for customers and sellers tables
PostgreSQL: Target database for transformed data storage

Flink Jobs

Data Generator Job (mssql_job): Generates synthetic data for testing
CDC Processing Job (cdc_job): Processes CDC events and loads to PostgreSQL

Quick Start

Prerequisites

Docker and Docker Compose
Java 11+ (for building JAR files)
Maven 3.6+ (for building projects)

Setup

Clone and navigate to project:

git clone <repository-url>
cd cdc_docker

Start the complete environment:

docker compose up

That's it! The Docker Compose configuration includes everything needed:

Pre-built Flink jobs are available in flink_jobs_jars/ directory
All services (Flink cluster, SQL Server, PostgreSQL, CloudBeaver) start automatically
Flink jobs are automatically submitted and executed
The complete MSSQL → PostgreSQL data integration pipeline starts immediately

Verify services:

Flink Web UI: http://localhost:8081
CloudBeaver: http://localhost:8978
- Username: demir
- Password: Demirdemir1*

Optional - Rebuild jobs (only if you modify the source code):

# Build data generator
cd mssql_job
mvn clean package
cp target/mssql-data-generator-1.0-SNAPSHOT.jar ../flink_jobs_jars/

# Build CDC job
cd ../cdc_job
mvn clean package
cp target/cdc_job-1.jar ../flink_jobs_jars/cdc_job-1.jar
cd ..

Configuration

SQL Server CDC Setup

The environment automatically:

Enables CDC on the database
Starts SQL Server Agent (required for CDC)
Creates customers and sellers tables
Configures CDC for both tables
Sets up CDC capture jobs for real-time change tracking

PostgreSQL Setup

Creates target tables with CDC metadata columns
Configures connection for Flink jobs

Automatic Dependencies

The system automatically downloads required JAR files:

PostgreSQL JDBC Driver
SQL Server JDBC Driver
Flink SQL Gateway API

Usage

Monitoring

Flink Dashboard: Monitor job status, metrics, and logs
CloudBeaver: Query both source and target databases
Docker Logs: View container logs for troubleshooting

Data Flow

Data Generator creates INSERT/UPDATE/DELETE operations in SQL Server
CDC captures changes in real-time
Flink processes and transforms the data
Processed data is loaded into PostgreSQL with operation metadata

Sample Queries

Check CDC data in PostgreSQL:

-- View customer changes
SELECT * FROM customers ORDER BY cdc_timestamp DESC LIMIT 10;

-- View seller changes  
SELECT * FROM sellers ORDER BY cdc_timestamp DESC LIMIT 10;

-- Count operations by type
SELECT operation, COUNT(*) FROM customers GROUP BY operation;

Troubleshooting

Common Issues

Port conflicts: Ensure ports 8081, 1433, 5432, 8978 are available
Memory issues: Increase Docker memory allocation if needed
Job failures: Check Flink logs in the web UI

Logs

# View specific service logs
docker compose logs jobmanager
docker compose logs taskmanager1
docker compose logs mssql
docker compose logs postgresql

# Follow logs in real-time
docker compose logs -f flink-job-submitter

Restart Services

# Restart specific service
docker compose restart jobmanager

# Restart all services
docker compose down && docker compose up -d

# Clean restart (removes volumes)
docker compose down -v && docker compose up -d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Flink CDC Docker Environment

What is CDC and Flink?

Architecture

Components

Flink Cluster

Databases

Flink Jobs

Quick Start

Prerequisites

Setup

Configuration

SQL Server CDC Setup

PostgreSQL Setup

Automatic Dependencies

Usage

Monitoring

Data Flow

Sample Queries

Troubleshooting

Common Issues

Logs

Restart Services

Development

Adding New Jobs

Modifying Database Schema

Performance Tuning

Flink Configuration

Database Optimization

FilesExpand file tree

Flink_CDC_Java

Directory actions

More options

Directory actions

More options

Latest commit

History

Flink_CDC_Java

Folders and files

parent directory

README.md

Flink CDC Docker Environment

What is CDC and Flink?

Architecture

Components

Flink Cluster

Databases

Flink Jobs

Quick Start

Prerequisites

Setup

Configuration

SQL Server CDC Setup

PostgreSQL Setup

Automatic Dependencies

Usage

Monitoring

Data Flow

Sample Queries

Troubleshooting

Common Issues

Logs

Restart Services

Development

Adding New Jobs

Modifying Database Schema

Performance Tuning

Flink Configuration

Database Optimization