The course is divided into four major modules.
- The language of Datasets
- The language of DataFrames
- Spark on AWS
- Spark optmizations & good practices
Join the chat room at: https://gitter.im/tfgs-functionalprogramming-urjc/spark-tfgs#share
The course is run on Jupyter notebooks. We recommend installing docker to run jupyter with Spark. Please, follow these steps to set up everything you need.
-
Install docker on your machine
1.1 Linux
1.2 Windows (if your windows is not 10 pro, install this one)
1.3 MAC
-
Clone this repository
2.1 Open a terminal
2.2 Go to the folder where you want to download this repo
2.3 Run this command:
git clone https://github.com/jserranohidalgo/spark-intro.git [<nombre del proyecto>]
2.4 Move into your local repository
cd <nombre del proyecto>
-
Run the docker image for Spark
3.1 Make sure that docker daemon is running
3.2 Run the following command:
-
Linux / MAC
docker run -it --rm -p 8888:8888 -p 4040:4040 -m 4g -v "$PWD":/home/jovyan/work almondsh/almond:0.9.1 -
Windows
docker run -it --rm -p 8888:8888 -p 4040:4040 -m 4g -v {c:/path/to/downloaded/folder}:/home/jovyan/work almondsh/almond:0.9.1
It can take a while to download the image for the first time.
-
-
Enter Jupyter
4.1 Copy the token that is shown in the terminal
Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://(824918044eeb or 127.0.0.1):8888/?token=57c1a89124a898d1c6d4ca404445c9b54c9c6d6cfc558f9f4.2 Go to localhost, paste the token and log in.

-
Test Notebook
5.1 In Jupyter open with a click the work folder
5.2 Open with a click the
Test notebook5.3 Press the run button, if yo don't see any errors and at the end you see
it works!you have all up and running. -
Close the jupyter container
Atention!Make sure that you save everything before closing Jupyter. The docker parameter
--rm- deletes the container (the image is still in your machine), so as to get a clean enviroment each time you open it, but also discards all non saved changes in the notebooks.6.1 Go to the shell that is running the container
6.2 Control + c to send a kill signal
6.3 Type 'y' and enter to finish.
You can install jupyter and the Scala kernel yourself. Please follow the following instructions:
This course owes a lot to the staff of Habla Computing. Particular thanks to Mikel San Vicente and Alfonso Roa.
