-
Notifications
You must be signed in to change notification settings - Fork 218
Dask
Notes about running a small Dask/Distributed cluster in your CoCalc project for prototyping and educational use.
-
Create a terminal, e.g.
dask.term
-
Use the split buttons at the top right to split it horizontally and vertically into 4 panels.
-
Start the Scheduler in the first terminal panel
dask-scheduler
-
Start three works in the other 3 panels via that command.
-
connecting to localhost (not the IP address, which is changing between project restarts!)
-
set the local directory to
~/dask-worker
in your home directory (all their data will be there, when shutting down the cluster, you can get rid of all temp data at once by removing that directory) -
threads: just one
-
processes: also just one
-
and a conservative memory limit per worker. Note: If you run into memory-limit issues for individual workers, switch to running two clients with 512M memory limit. You can also get memory upgrades to be able to allocate more. In case you only run two workers, you can start
htop
in the 4th panel in order to keep an eye on all running processes in your project.dask-worker tcp://localhost:8786 --local-directory ~/dask-worker --nthreads 1 --nprocs 1 --memory-limit 256M
-
Tipp: in each panel of the terminal, there is an icon with a "rocket". Click it to open up a startup initialization script of that very panel. Paste these commands right there, and the next time you start your project and open up that terminal (just keep that tab opened), all init commands for each panel will run. That way, your little cluster is always spun up when you work in your project. If there is an issue, run Ctrl-c
and then Ctrl-d
to interrupt and exit the running instance. It will respan and run that init command again...
- Create a Jupyter Notebook, e.g.
dask.ipynb
. - Check if
dask
imports fine and set the temporary directory to be in your project's files (the/tmp
directory is virtual and in memory). The config below also enables work stealing, because we're on the same node anyways ...
import dask
import dask.distributed
import os
dask.config.set({
'temporary_directory': os.path.expanduser('~/tmp'),
'scheduler.work-stealing': True
})
- Create you client, it should return a general status information (how many clients, memory, etc.)
from dask.distributed import Client
client = Client('127.0.0.1:8786')
client
If that worked, congratulations! You can start submitting tasks to your little cluster ...
From here, you can also check the actual configuration:
dask.config.config
In theory, it should be possible to open this URL to see it, but for unknown reasons the websocket connection fails to work on CoCalc.
Alternatively, create an X11 session (e.g. dask.x11
) and start chrome (google-chrome
) or firefox (firefox
) in the terminal. Then open the dashboard URL. If everything loads up fine, you'll see it here:
This Wiki is for CoCalc.com.
A more structured documentation is the CoCalc User Manual.
For further questions, please contact us.