Skip to content

Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,Airflow, Kafka Strimzi, Datahub, OpenMetadata,Zeppelin, Jupyter, JFrog Container Registry

Notifications You must be signed in to change notification settings

rogeriomm/labtools-k8s

Folders and files

NameName
Last commit message
Last commit date

Latest commit

cc092a5 · Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025

Repository files navigation

This is a work in progress...

Pipeline architecture

Loading
flowchart TD
    Postgres(Postgres Database) -->|CDC| Kafka(Kafka Strimzi)
    SQLServer(SQL Server Database) -->|CDC| Kafka
    Kafka -->|AVRO Data Stream| ConsumerMinio(Minio S3)
    ConsumerMinio -->|AVRO Data Stream| ConsumerSpark(Apache Spark)
    ConsumerSpark --> |CDC Replication using Scala Engine - TODO| ConsumerDelta(Delta Lake)
    ConsumerSpark --> |Data catalog, lineage| ConsumerDatahub(Datahub)
    ConsumerSpark --> HiveMetastore(Hive metastore)
    Kafka -->|Schema Management| SchemaRegistry(Confluent Schema Registry)
    Kafka --> RedpandaConsole(Redpanda Console)
    SchemaRegistry -->|Schema Use - API| ConsumerSpark
    ConsumerDelta -->|Data Query| Trino(Trino)
    click ConsumerDelta href "https://github.com/rogeriomm/debezium-cdc-replication-delta" "Visit GitHub repository"
    Airflow(Apache Airflow) -->|Orchestrate| ConsumerSpark
    Trino --> Zeppelin(Zeppelin)
    Trino --> Jupyter(Jupyter)
    Trino --> Metabase(Metabase)
    
    class Postgres,SQLServer database;
    class Kafka,SchemaRegistry kafka;
    class ConsumerMinio,ConsumerSpark,ConsumerDelta consumers;
    class Datahub datahub;

Kafka Strimzi, Debezium CDC AVRO, Confluent Schema Registry, Postgres/SQL Server

Postgres

drawing

drawing

Microsoft SQL Server CDC

Zeppelin/Jupyter

drawing

drawing

Spark

drawing

Metabase

drawing

Datahub

drawing

OpenMetadata

drawing

Airflow

drawing

Minio

drawing

drawing

Argo CD

drawing

Kubernetes

drawing

Web local

Local URL Description User Password
https://dashboard.worldl.xpt/ K8S dashboard
https://argocd.worldl.xpt ArgoCD admin Notebook
https://zeppelin.worldl.xpt Zeppelin
https://jupyter.worldl.xpt/jupyter Jupyter notebook: Python,Scala, RUST
https://jupyter-commander.worldl.xpt/jupyter Jupyter notebook: Python,Scala, RUST - K8S Admin Service Account
https://minio-console.worldl.xpt MINIO operator instance minio-tenant-1 minio awesomes3
https://console.minio-operator.svc.cluster2.xpt:9090 MINIO operator
https://airflow.worldl.xpt/flower/ Airflow flower admin admin
https://airflow.worldl.xpt/airflow Airflow
https://jupyter-glue2.worldl.xpt/ AWS Glue version 2.0 - Jupyter
https://webui-glue2.worldl.xpt/ AWS Glue version 2.0 - WebUI
https://history-glue2.worldl.xpt/ AWS Glue version 2.0 - History
https://jupyter-glue3.worldl.xpt/ AWS Glue version 3.0 - Jupyter
https://webui-glue3.worldl.xpt/ AWS Glue version 3.0 - WebUI
https://history-glue3.worldl.xpt/ AWS Glue version 3.0 - History
https://jupyter-glue4.worldl.xpt/ AWS Glue version 4.0 - Jupyter
https://webui-glue4.worldl.xpt/ AWS Glue version 4.0 - WebUI
https://history-glue4.worldl.xpt/ AWS Glue version 4.0 - History
http://datahub.worldl.xpt/ Datahub datahub manualPassword
https://openmetadata.worldl.xpt/ OpenMetadata admin admin
https://kafkaui.worldl.xpt/ Kafka UI
https://redpanda-console.worldl.xpt/ Redpanda Console
https://metabase.worldl.xpt/ Metabase
http://trino.trino.svc:8080 Trino
https://jfrog.worldl.xpt Jfrog admin password
https://harbor.worldl.xpt Harbor admin notebook
https://nexus.worldl.xpt/ Nexus Free trial admin admin123
https://nexus.admin.worldl.xpt/ Nexus Free trial
https://keycloack.worldl.xpt Keycloak user notebook

Internet Web (Protected by Firewall)

Public URL Description
https://world-zeppelin.duckdns.org Zeppelin
https://world-jupyter.duckdns.org/jupyter Jupyter notebook: Python, Scala, RUST

About

Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,Airflow, Kafka Strimzi, Datahub, OpenMetadata,Zeppelin, Jupyter, JFrog Container Registry

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published