A setup to locally test Apache Spark applications in a Kubernetes (K8S) cluster backed by a S3-based Hadoop.
This repo is a WIP.
This setup has the following components:
- A Docker image with a Spark-hadoop setup.
- Created a lightweight image by doing several build stages and copying only the necessary files.
- The Kubernetes cluster is based in Minikube and runs in Virtualbox (check the Makefile).
- The S3 buckets are emulated by Localstack and initialized in this docker-compose.
- To monitor the Spark Applications, you can start a Spark History Server by applying this K8S manifest.