Skip to content
Harald Schilly edited this page Nov 30, 2018 · 14 revisions

Notes about running a small Dask/Distributed cluster in your CoCalc project for prototyping and educational use.

Start cluster

  1. Create a terminal, e.g. dask.term

  2. Use the split buttons at the top right to split it horizontally and vertically into 4 panels.

  3. Start the Scheduler in the first terminal panel

    dask-scheduler
    
  4. Start three works in the other 3 panels, connecting to localhost (not the IP address, which is changing between project restarts!)

    dask-worker tcp://localhost:8786 --nthreads 1 --nprocs 1 --memory-limit 256M
    

Tipp: in each panel of the terminal, there is an icon with a "rocket". Click it to open up a startup initialization script of that very panel. Paste these commands right there, and the next time you start your project and open up that terminal (just keep that tab opened), these 4 commands will be run. That way, your little cluster is always spun up when you work in your project. If there is an issue, run Ctrl-c and then Ctrl-d to interrupt and exit the running instance. It will respan and run that init command again...

Note: If you run into memory-limit issues, switch to running two clients with 512M memory limit. You can also get memory upgrades to be able to allocate more ... In the 4th panel you can start htop instead to keep an eye on all the running processes in your project.

Connect with a client

  1. Create a Jupyter Notebook, e.g. dask.ipynb.
  2. Check if dask imports fine and set the temporary directory to be in your project's files (the /tmp directory is virtual and in memory)
import dask
import dask.distributed
import os
dask.config.set({'temporary_directory': os.path.expanduser('~/tmp')})
  1. Create you client, it should return a general status information (how many clients, memory, etc.)
from dask.distributed import Client
client = Client('127.0.0.1:8786')
client

If that worked, congratulations! You can start submitting tasks to your little cluster ...

From here, you can also check the actual configuration:

dask.config.config

Dashboard

In theory, it should be possible to open this URL to see it, but for unknown reasons the websocket connection fails to work on CoCalc.

Alternatively, create an X11 session (e.g. dask.x11) and start chrome (google-chrome) or firefox (firefox) in the terminal. Then open the dashboard URL. If everything loads up fine, you'll see it here:

Clone this wiki locally