Skip to content
This repository was archived by the owner on Feb 20, 2024. It is now read-only.

Commit 61606c8

Browse files
authored
Merge pull request #79 from nginyc/add_gpu_aware_placement
[V0.0.8] Add GPU-aware placement (+ model knobs and logging API changes)
2 parents f313623 + 700b8dc commit 61606c8

37 files changed

+1041
-849
lines changed

.env.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Core configuration for Rafiki
22
export DOCKER_NETWORK=rafiki
3-
export RAFIKI_VERSION=0.0.7
3+
export RAFIKI_VERSION=0.0.8
44
export RAFIKI_IP_ADDRESS=127.0.0.1
55
export ADMIN_EXT_PORT=3000
66
export ADMIN_WEB_EXT_PORT=3001

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,8 @@ data/*
2525
# Logs
2626
*.log
2727
logs/*
28-
!logs/.gitkeep
28+
!logs/.gitkeep
29+
30+
# IPython notebooks
31+
.ipynb_checkpoints/*
32+
*.ipynb

dockerfiles/advisor.Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ ENV PYTHONPATH $DOCKER_WORKDIR_PATH
2222
# Install python dependencies
2323
COPY rafiki/utils/requirements.txt utils/requirements.txt
2424
RUN pip install -r utils/requirements.txt
25+
COPY rafiki/model/requirements.txt model/requirements.txt
26+
RUN pip install -r model/requirements.txt
2527
COPY rafiki/advisor/requirements.txt advisor/requirements.txt
2628
RUN pip install -r advisor/requirements.txt
2729

docs/src/dev/setup.rst

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,19 @@ Adding Nodes to Rafiki
3636

3737
Rafiki has with its dynamic stack (e.g. train workers, inference workes, predictors)
3838
running as `Docker Swarm Services <https://docs.docker.com/engine/swarm/services/>`_.
39-
Horizontal scaling can be done by `adding more nodes to the swarm <https://docs.docker.com/engine/swarm/join-nodes/>`_.
39+
40+
Horizontal scaling can be done by adding more nodes to the swarm.
41+
42+
Perform the following for *each* worker node to be added:
43+
44+
1. Connect the node to the same network as the master, so that the node can `join the master's Docker Swarm <https://docs.docker.com/engine/swarm/join-nodes/>`_.
45+
46+
2. Configure the node with the script:
47+
48+
.. code-block:: shell
49+
50+
bash scripts/setup_node.sh
51+
4052
4153
Exposing Rafiki Publicly
4254
--------------------------------------------------------------------

docs/src/python/rafiki.model.rst

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,47 @@
11
rafiki.model
22
====================================================================
33

4+
.. contents:: Table of Contents
5+
6+
Core Classes
7+
--------------------------------------------------------------------
8+
49
.. autoclass:: rafiki.model.BaseModel
510
:members:
611

12+
.. autoclass:: rafiki.model.BaseKnob
13+
:members:
14+
15+
16+
.. _`knob-types`:
17+
18+
Knob Classes
19+
--------------------------------------------------------------------
20+
21+
.. autoclass:: rafiki.model.CategoricalKnob
22+
:members:
23+
24+
.. autoclass:: rafiki.model.IntegerKnob
25+
:members:
26+
27+
.. autoclass:: rafiki.model.FloatKnob
28+
:members:
29+
30+
.. autoclass:: rafiki.model.FixedKnob
31+
:members:
32+
33+
34+
Utility Classes & Methods
35+
--------------------------------------------------------------------
36+
737
.. automethod:: rafiki.model.test_model_class
838

9-
.. autoclass:: rafiki.model.log.ModelLogUtils
39+
.. autoclass:: rafiki.model.ModelLogger
1040
:members:
1141

12-
.. autoclass:: rafiki.model.dataset.ModelDatasetUtils
42+
.. autoclass:: rafiki.model.ModelDatasetUtils
1343
:members:
1444

15-
.. autoclass:: rafiki.model.dataset.ImageFilesDataset
45+
.. autoclass:: rafiki.model.ImageFilesDataset
1646

17-
.. autoclass:: rafiki.model.dataset.CorpusDataset
47+
.. autoclass:: rafiki.model.CorpusDataset

docs/src/user/client-create-models.include.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11

2-
To create a model, you will need to submit a model class that extends :class:`rafiki.model.BaseModel` in a single Python file,
3-
where the model's implementation conforms to a specific task (see :ref:`tasks`).
2+
To create a model, you will need to submit a model class that conforms to the specification
3+
by :class:`rafiki.model.BaseModel`, written in a `single` Python file.
4+
The model's implementation should conform to a specific task (see :ref:`tasks`).
45

56
Refer to the parameters of :meth:`rafiki.client.Client.create_model` for configuring how your model runs on Rafiki,
67
and refer to :ref:`creating-models` to understand more about how to write & test models for Rafiki.

docs/src/user/creating-models.rst

Lines changed: 35 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,41 @@ Creating Models
66

77
.. contents:: Table of Contents
88

9+
To create a model, you will need to submit a model class that conforms to the specification
10+
by :class:`rafiki.model.BaseModel`, written in a `single` Python file.
11+
The model's implementation should conform to a specific task (see :ref:`tasks`).
12+
To submit the model to Rafiki, use the :meth:`rafiki.client.Client.create_model` method.
913

10-
To create a model on Rafiki, use the :meth:`rafiki.client.Client.create_model` method.
14+
Implementing Models
15+
--------------------------------------------------------------------
16+
17+
Full details on how to implement a model are located in the documentation of :class:`rafiki.model.BaseModel`,
18+
and sample model implementations are located in `./examples/models/ <https://github.com/nginyc/rafiki/tree/master/examples/models/>`_.
19+
20+
In defining the hyperparameters (knobs) of a model, refer to the documentation at :ref:`knob-types` for the full list of knob types.
1121

22+
After implementing your model, it is highly recommended to use :meth:`rafiki.model.test_model_class`
23+
to test your model. This method simulates a full train-inference flow on your model, ensuring that
24+
it is likely to work on Rafiki.
25+
26+
Logging in Models
27+
--------------------------------------------------------------------
28+
29+
By importing the global ``logger`` instance in the ``rafiki.model`` module,
30+
you can log messages and metrics while your model is being trained, and you can
31+
define plots to visualize your model's training on Rafiki's Admin Web interface.
32+
33+
Refer to :class:`rafiki.model.ModelLogger` for full usage instructions.
34+
35+
.. seealso:: :ref:`using-admin-web`
36+
37+
Dataset Loading in Models
38+
--------------------------------------------------------------------
39+
40+
The global ``dataset_utils`` instance in the ``rafiki.model`` module provides
41+
a set of built-in dataset loading methods for common dataset types on Rafiki.
42+
43+
Refer to :class:`rafiki.model.ModelDatasetUtils` for full usage instructions.
1244

1345
Model Environment
1446
--------------------------------------------------------------------
@@ -25,21 +57,20 @@ prior to model training and inference. This is configurable with the ``dependenc
2557
during model creation.
2658

2759
Alternatively, you can build a custom Docker image that extends ``rafikiai/rafiki_worker``,
28-
installing the required dependencies for your model. This is configurable with ``docker_image``) option
60+
installing the required dependencies for your model. This is configurable with ``docker_image`` option
2961
during model creation.
3062

3163
Models should run at least run on CPU-only machines and optionally leverage on a shared GPU, if it is available.
3264

3365
Refer to the parameters of :meth:`rafiki.client.Client.create_model` for configuring how your model runs on Rafiki.
3466

35-
Testing Models
67+
Sample Models
3668
--------------------------------------------------------------------
3769

3870
To illustrate how to write models on Rafiki, we have written the following:
3971

4072
- Sample pre-processing logic to convert common dataset formats to Rafiki's own dataset formats in `./examples/datasets/ <https://github.com/nginyc/rafiki/tree/master/examples/datasets/>`_
4173
- Sample models in `./examples/models/ <https://github.com/nginyc/rafiki/tree/master/examples/models/>`_
42-
- A method :meth:`rafiki.model.test_model_class` that simulates a full train-inference flow on any Rafiki model
4374

4475
To start testing your model, first install the Python dependencies at ``rafiki/model/requirements.txt``:
4576

@@ -93,13 +124,3 @@ Example: Testing Models for ``POS_TAGGING``
93124
94125
python examples/models/pos_tagging/BigramHmm.py
95126
python examples/models/pos_tagging/PyBiLstm.py
96-
97-
98-
Model Logging & Dataset Loading
99-
--------------------------------------------------------------------
100-
101-
:class:`rafiki.model.BaseModel` has a property ``utils`` that subclasses the model utility classes
102-
:class:`rafiki.model.log.ModelLogUtils` and :class:`rafiki.model.dataset.ModelDatasetUtils`. They
103-
help with model logging & dataset loading respectively.
104-
105-
Refer to the sample usage in the implementation of `./examples/models/image_classification/TfSingleHiddenLayer.py <https://github.com/nginyc/rafiki/tree/master/examples/models/image_classification/TfSingleHiddenLayer.py>`_.

docs/src/user/quickstart.rst

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,17 @@ Quick Start
77

88
.. note::
99

10-
If you're a *Model Developer* just looking to contribute models to a running instance of Rafiki, refer to :ref:`quickstart-model-developers`.
10+
- If you're a *Model Developer* just looking to contribute models to a running instance of Rafiki, refer to :ref:`quickstart-model-developers`.
11+
- If you're an *Application Developer* just looking to train and deploy models on a running instance of Rafiki, refer to :ref:`quickstart-app-developers`.
12+
- If you're an *Application User* just looking to make predictions to deployed models on a running instance of Rafiki, refer to :ref:`quickstart-app-users`.
1113

12-
.. note::
13-
14-
If you're an *Application Developer* just looking to train and deploy models on a running instance of Rafiki, refer to :ref:`quickstart-app-developers`.
15-
16-
.. note::
17-
18-
If you're an *Application User* just looking to make predictions to deployed models on a running instance of Rafiki, refer to :ref:`quickstart-app-users`.
1914

15+
This guide assumes you have deployed your an empty instance of Rafiki and you want to try a *full* train-inference flow,
16+
including adding of models, submitting a train job and submitting a inference job to Rafiki.
2017

21-
This guide assumes you have deployed your an empty instance of Rafiki and you want to do a *full* train-inference flow,
22-
including preparation of dataset and adding of models to Rafiki. Below, the sequence of examples submit the
23-
`Fashion MNIST dataset <https://github.com/zalandoresearch/fashion-mnist>`_ for training and inference.
24-
Alternatively, after installing Rafiki Client's dependencies, you can run `./examples/scripts/client_quickstart.py <https://github.com/nginyc/rafiki/blob/master/examples/scripts/client_quickstart.py>`_.
18+
The sequence of examples below submits the `Fashion MNIST dataset <https://github.com/zalandoresearch/fashion-mnist>`_ for training and inference.
19+
Alternatively, after installing the Rafiki Client's dependencies, you can refer and run the scripted version of this quickstart
20+
`./examples/scripts/client_quickstart.py <https://github.com/nginyc/rafiki/blob/master/examples/scripts/client_quickstart.py>`_.
2521

2622
.. note::
2723

examples/models/image_classification/SkDt.py

Lines changed: 15 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -5,38 +5,29 @@
55
import base64
66
import numpy as np
77

8-
from rafiki.model import BaseModel, InvalidModelParamsException, test_model_class
8+
from rafiki.config import APP_MODE
9+
from rafiki.model import BaseModel, InvalidModelParamsException, test_model_class, \
10+
IntegerKnob, CategoricalKnob, dataset_utils, logger
911
from rafiki.constants import TaskType, ModelDependency
1012

1113
class SkDt(BaseModel):
1214
'''
1315
Implements a decision tree classifier on Scikit-Learn for simple image classification
1416
'''
15-
16-
def get_knob_config(self):
17+
@staticmethod
18+
def get_knob_config():
1719
return {
18-
'knobs': {
19-
'max_depth': {
20-
'type': 'int',
21-
'range': [2, 8]
22-
},
23-
'criterion': {
24-
'type': 'string',
25-
'values': ['gini', 'entropy']
26-
},
27-
}
20+
'max_depth': IntegerKnob(2, 16 if APP_MODE != 'DEV' else 8),
21+
'criterion': CategoricalKnob(['gini', 'entropy'])
2822
}
2923

30-
def init(self, knobs):
31-
self._max_depth = knobs.get('max_depth')
32-
self._criterion = knobs.get('criterion')
33-
self._clf = self._build_classifier(
34-
self._max_depth,
35-
self._criterion
36-
)
37-
24+
def __init__(self, **knobs):
25+
super().__init__(**knobs)
26+
self.__dict__.update(knobs)
27+
self._clf = self._build_classifier(self.max_depth, self.criterion)
28+
3829
def train(self, dataset_uri):
39-
dataset = self.utils.load_dataset_of_image_files(dataset_uri)
30+
dataset = dataset_utils.load_dataset_of_image_files(dataset_uri)
4031
(images, classes) = zip(*[(image, image_class) for (image, image_class) in dataset])
4132
X = self._prepare_X(images)
4233
y = classes
@@ -45,10 +36,10 @@ def train(self, dataset_uri):
4536
# Compute train accuracy
4637
preds = self._clf.predict(X)
4738
accuracy = sum(y == preds) / len(y)
48-
self.utils.log('Train accuracy: {}'.format(accuracy))
39+
logger.log('Train accuracy: {}'.format(accuracy))
4940

5041
def evaluate(self, dataset_uri):
51-
dataset = self.utils.load_dataset_of_image_files(dataset_uri)
42+
dataset = dataset_utils.load_dataset_of_image_files(dataset_uri)
5243
(images, classes) = zip(*[(image, image_class) for (image, image_class) in dataset])
5344
X = self._prepare_X(images)
5445
y = classes

examples/models/image_classification/SkSvm.py

Lines changed: 15 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -5,57 +5,38 @@
55
import base64
66
import numpy as np
77

8-
from rafiki.model import BaseModel, InvalidModelParamsException, test_model_class
8+
from rafiki.config import APP_MODE
9+
from rafiki.model import BaseModel, InvalidModelParamsException, test_model_class, \
10+
IntegerKnob, CategoricalKnob, FloatKnob, dataset_utils
911
from rafiki.constants import TaskType, ModelDependency
1012

1113
class SkSvm(BaseModel):
1214
'''
1315
Implements a SVM on Scikit-Learn for simple image classification
1416
'''
15-
16-
def get_knob_config(self):
17+
@staticmethod
18+
def get_knob_config():
1719
return {
18-
'knobs': {
19-
'max_iter': {
20-
'type': 'int',
21-
'range': [10, 10]
22-
},
23-
'kernel': {
24-
'type': 'string',
25-
'values': ['rbf', 'linear']
26-
},
27-
'gamma': {
28-
'type': 'string',
29-
'values': ['scale', 'auto']
30-
},
31-
'C': {
32-
'type': 'float_exp',
33-
'range': [1e-2, 1e2]
34-
}
35-
}
20+
'max_iter': IntegerKnob(10, 40 if APP_MODE != 'DEV' else 10),
21+
'kernel': CategoricalKnob(['rbf', 'linear']),
22+
'gamma': CategoricalKnob(['scale', 'auto']),
23+
'C': FloatKnob(1e-2, 1e2, is_exp=True)
3624
}
3725

38-
def init(self, knobs):
39-
self._max_iter = knobs.get('max_iter')
40-
self._kernel = knobs.get('kernel')
41-
self._gamma = knobs.get('gamma')
42-
self._C = knobs.get('C')
43-
self._clf = self._build_classifier(
44-
self._max_iter,
45-
self._kernel,
46-
self._gamma,
47-
self._C
48-
)
26+
def __init__(self, **knobs):
27+
super().__init__(**knobs)
28+
self.__dict__.update(knobs)
29+
self._clf = self._build_classifier(self.max_iter, self.kernel, self.gamma, self.C)
4930

5031
def train(self, dataset_uri):
51-
dataset = self.utils.load_dataset_of_image_files(dataset_uri)
32+
dataset = dataset_utils.load_dataset_of_image_files(dataset_uri)
5233
(images, classes) = zip(*[(image, image_class) for (image, image_class) in dataset])
5334
X = self._prepare_X(images)
5435
y = classes
5536
self._clf.fit(X, y)
5637

5738
def evaluate(self, dataset_uri):
58-
dataset = self.utils.load_dataset_of_image_files(dataset_uri)
39+
dataset = dataset_utils.load_dataset_of_image_files(dataset_uri)
5940
(images, classes) = zip(*[(image, image_class) for (image, image_class) in dataset])
6041
X = self._prepare_X(images)
6142
y = classes

0 commit comments

Comments
 (0)