This repository provides a development environment for data engineering labs.
- Python 3.12 or higher
- k3d (for local Kubernetes cluster)
- kubectl (for Kubernetes cluster management)
- uv (Python package installer)
-
Clone the repository:
git clone <repository-url> cd de-labs
-
Create a virtual environment and install dependencies using uv:
uv venv source .venv/bin/activate # On Unix/macOS # OR .venv\Scripts\activate # On Windows uv pip install .
-
Create a local Kubernetes cluster:
make create-k3d-cluster
This will create a k3d cluster with 1 server and 2 agent nodes.
-
Initialize the namespace:
make init-namespace
make deploy svc=minio,spark
# Deploy MinIO only
make deploy svc=minio
# Deploy Spark only
make deploy svc=spark
make undeploy svc=minio,spark
# Undeploy MinIO only
make undeploy svc=minio
# Undeploy Spark only
make undeploy svc=spark
-
Remove all resources from the current namespace:
make clean-all
-
Delete the k3d cluster:
make delete-k3d-cluster
-
If you encounter issues with the cluster context, ensure you're using the correct kubectl context:
kubectl config use-context k3d-de-labs
-
To verify the cluster status:
make check-k3d-cluster