- provisioning on remote hosts
- Test spark in standalone mode vs yarn
- A script to install custom packages across the cluster
- Set spark to run in standalone mode
- fix work directory
- 23/06/06 12:27:25 INFO Worker: Running Spark version 3.3.2 23/06/06 12:27:25 INFO Worker: Spark home: /home/vagrant/spark 23/06/06 12:27:25 ERROR Utils: Failed to create directory /home/vagrant/spark/work java.nio.file.AccessDeniedException: /home/vagrant/spark/work
- fix work directory
- Factor spark configs that are likely to be tweaked
- Successfully run a spark-submit job in the cluster
- Figure out starting services via ansible
- using nohup seemed to help
- Configure spark for the cluster
- create hdfs logs directory (needs hdfs running)
- Download and install spark binaries
- A mapping from host directory to master node when using vagrant
- Directory at $HOME/data-playground-project will be synced to master node VM
- Can run yarn
- Can run hdfs
- Can format hdfs
- Can configure hadoop
- Can set up all the required environment variables
- Fix first run check
- dist-upgrade only runs if hdfs is not set up
-
vagrant up --provisionshould provision the VMs when all of them are running, instead of serially - Distribute the hadoop installation between nodes - https://dlcdn.apache.org/hadoop/common/ - https://stackoverflow.com/q/25505146/1382495 - steps: - download, verify and sync - unarchive
- Install hadoop dependencies
- openjdk-11
- Generate and distribute masters ssh pub key to workers - https://docs.ansible.com/ansible/latest/collections/community/crypto/openssh_keypair_module.html - https://docs.ansible.com/ansible/latest/collections/ansible/builtin/fetch_module.html - https://docs.ansible.com/ansible/latest/collections/ansible/posix/authorized_key_module.html
- Configure hostname and /etc/hosts for each server
- Check that apt packages are up to date
- Do a full upgrade on first run, and a safe upgrade otherwise
- Use ansible insecure key for local ssh access
- Can run
vagrant upto bring up a 3 node cluster- use ansible version 2.10 (debian 11 default)