Skip to content

Commit

Permalink
Merge pull request #10 from apls777/dev
Browse files Browse the repository at this point in the history
availability zone, retain deletion policy, auto-resize
  • Loading branch information
apls777 authored Sep 18, 2018
2 parents 60f6b3b + 3d724d7 commit 025caf0
Show file tree
Hide file tree
Showing 9 changed files with 299 additions and 302 deletions.
233 changes: 48 additions & 185 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,208 +1,71 @@
# Spotty

Spotty helps you to train deep learning models on [AWS Spot Instances](https://aws.amazon.com/ec2/spot/).

You don't need to spend time on:
- manually starting Spot Instances
- installation of NVIDIA drivers
- managing snapshots and AMIs
- detaching remote processes from your SSH sessions

Just start an instance using the following command:
```bash
$ spotty start
```
It will run a Spot Instance, restore snapshots if any, synchronize the project with the instance
and start Docker container with the environment.

Then train your model:
```bash
$ spotty run train
```
It runs your custom training command inside the Docker container. The remote connection uses
[tmux](https://github.com/tmux/tmux/wiki), so you can close the connection and come back to the running process any time later.

Connect to the container if necessary:
```bash
$ spotty ssh
```
It uses [tmux](https://github.com/tmux/tmux/wiki) session, so you can always detach the session using
`Crtl`+`b`, then `d` combination of keys and attach that session later using `$ spotty ssh` command again.
Spotty simplifies training of Deep Learning models on AWS:

## Installation
- it makes training on AWS GPU instances as simple as a training on your local computer
- it automatically manages all necessary AWS resources including AMIs, volumes and snapshots
- it makes your model trainable on AWS by everyone with a couple of commands
- it detaches remote processes from SSH sessions
- it saves you up to 70% of the costs by using Spot Instances

## Documentation

To install Spotty use [pip](http://www.pip-installer.org/en/latest/) package manger:
- See the [wiki section](https://github.com/apls777/spotty/wiki) for the documentation.
- Read [this](https://medium.com/@apls/how-to-train-deep-learning-models-on-aws-spot-instances-using-spotty-8d9e0543d365)
article on Medium for a real-world example.

$ pip install --upgrade spotty
## Installation

Requirements:
* Python 3
* AWS CLI (see [Installing the AWS Command Line Interface](http://docs.aws.amazon.com/cli/latest/userguide/installing.html))

## Configuration

By default, Spotty is looking for `spotty.yaml` file in the root directory of the project.
Here is a basic example of such file:

```yaml
project:
name: MyProjectName
remoteDir: /workspace/project
instance:
region: us-east-2
instanceType: p2.xlarge
volumes:
- snapshotName: MySnapshotName
directory: /workspace
size: 10
docker:
image: tensorflow/tensorflow:latest-gpu-py3
```
### Available Parameters
__`project`__ section:
- __`name`__ - the name of your project. It will be used to create S3 bucket and CloudFormation stack to run
an instance.
- __`remoteDir`__ - directory where your project will be stored on the instance. It's usually a directory
on the attached volume (see "instance" section).
- __`syncFilters`__ _(optional)_ - filters to skip some directories or files during synchronization. By default, all project files
will be synced with the instance. Example:
```yaml
syncFilters:
- exclude:
- .idea/*
- .git/*
- data/*
- include:
- data/test/*
- exclude:
- data/test/config
```

It will skip ".idea/", ".git/" and "data/" directories except "data/test/" directory. All files from "data/test/"
directory will be synced with the instance except "data/test/config" file.

You can read more about filters
here: [Use of Exclude and Include Filter](https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#use-of-exclude-and-include-filters).

__`instance`__ section:
- __`region`__ - region where your are going to run the instance (you can use command `spotty spot-prices` to find the
cheapest region),
- __`instanceType`__ - type of the instance to run. You can find more information about
types of GPU instances here:
[Recommended GPU Instances](https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html).
- __`amiName`__ _(optional)_ - name of the AMI with NVIDIA Docker (default value is "SpottyAMI"). Use
`spotty create-ami` command to create it. This AMI will be used to run your application inside the Docker container.
- __`maxPrice`__ _(optional)_ - the maximum price per hour that you are willing to pay for a Spot Instance. By default, it's
On-Demand price for chosen instance type. Read more here:
[Spot Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html).
- __`rootVolumeSize`__ _(optional)_ - size of the root volume in GB. The root volume will be destroyed once
the instance is terminated. Use attached volumes to store the data you need to keep (see "volumes" parameter below).
- __`volumes`__ _(optional)_ - the list of volumes to attach to the instance:
- __`snapshotName`__ _(optional)_ - name of the snapshot to restore. If a snapshot with this name doesn't exists,
it will be created from the volume once the instance is terminated.
- __`directory`__ - directory where the volume will be mounted,
- __`size`__ _(optional)_ - size of the volume in GB. Size of the volume cannot be less then the size of existing snapshot, but
can always be increased.
- __`deletionPolicy`__ _(optional)_ - possible values include: "__update_snapshot__" _(value by default)_,
"__create_snapshot__" and "__delete__". If this parameter is set to "__update_snapshot__", new snapshot with the
same name will be created and the original snapshot will be deleted. For "__create_snapshot__" value, new snapshot
will be created and the original snapshot will be renamed. For "__delete__" value, the volume will be deleted without
creating a snapshot.
- __`docker`__ - Docker configuration:
- __`image`__ _(optional)_ - the name of the Docker image that contains environment for your project. For example,
you could use [TensorFlow image for GPU]((https://hub.docker.com/r/tensorflow/tensorflow/))
(`tensorflow/tensorflow:latest-gpu-py3`). It already contains NumPy, SciPy, scikit-learn, pandas, Jupyter Notebook and
TensorFlow itself. If you need to use your own image, you can specify the path to your Dockerfile in the
__`file`__ parameter (see below), or push your image to the [Docker Hub](https://hub.docker.com/) and use its name.
- __`file`__ _(optional)_ - relative path to your custom Dockerfile. For example, you could take TensorFlow image as a
base one and add [AWS CLI](https://github.com/aws/aws-cli) there to be able to download your datasets from S3:
```dockerfile
FROM tensorflow/tensorflow:latest-gpu-py3
RUN pip install --upgrade \
pip \
awscli
```
- __`workingDir`__ _(optional)_ - working directory for your custom scripts (see "scripts" section below),
- __`dataRoot`__ _(optional)_ - directory where Docker will store all downloaded and built images. You could cache
images on your attached volume to avoid downloading them from internet or building your custom image from scratch
every time when you start an instance.
- __`commands`__ _(optional)_ - commands which should be performed once your container is started. For example, you
could download your datasets from S3 bucket to the project directory (see "project" section):
```yaml
commands: |
aws s3 sync s3://my-bucket/datasets/my-dataset /workspace/project/data
```
- __`ports`__ _(optional)_ - list of ports to open. For example:
```yaml
ports: [6006, 8888]
```
It will open ports 6006 for Jupyter Notebook and 8008 for TensorBoard.

__`scripts`__ section _(optional)_:
- This section contains customs scripts which can be run using `spotty run <SCRIPT_NAME>`
command. The following example defines scripts `train`, `jupyter` and `tensorflow`:

```yaml
project:
...
instance:
...
scripts:
train: |
PYTHONPATH=/workspace/project
python /workspace/project/model/train.py --num-layers 3
jupyter: |
/run_jupyter.sh --allow-root
tensorboard: |
tensorboard --logdir /workspace/outputs
```
Use [pip](http://www.pip-installer.org/en/latest/) to install or upgrade Spotty:

## Available Commands
$ pip install -U spotty

- `$ spotty start`

Runs a Spot Instance, synchronizes the project with that instance and starts a Docker container.
## Get Started

- `$ spotty stop`
1. Prepare a `spotty.yaml` file for your project.

Terminates the running instance and creates snapshots of the attached volumes.
- See the file specification [here](https://github.com/apls777/spotty/wiki/Configuration-File).
- Read [this](https://medium.com/@apls/how-to-train-deep-learning-models-on-aws-spot-instances-using-spotty-8d9e0543d365)
article for a real-world example.

- `$ spotty run <SCRIPT_NAME> [--session-name <SESSION_NAME>]`
2. Create an AMI with NVIDIA Docker. Run the following command from the root directory of your project
(where the `spotty.yaml` file is located):

Runs a custom script inside the Docker container (see "scripts" section in [Available Parameters](#Available-Parameters)).

Use `Crtl`+`b`, then `d` combination of keys to be detached from SSH session. The script will keep running.
Call `$ spotty run <SCRIPT_NAME>` again to be reattached to the running script.
Read more about tmux here: [tmux Wiki](https://github.com/tmux/tmux/wiki).

If you need to run the same script several times in parallel, use the `--session-name` parameter to
specify different names for tmux sessions.
```bash
$ spotty create-ami
```

- `$ spotty ssh [--host-os]`
In several minutes you will have an AMI that can be used for all your projects within the AWS region.

Connects to the running Docker container or to the instance itself. Use the `--host-os` parameter to connect to the
host OS instead of the Docker container.
3. Start an instance:

- `$ spotty sync`
```bash
$ spotty start
```

It will run a Spot Instance, restore snapshots if any, synchronize the project with the running instance
and start the Docker container with the environment.

4. Train a model or run notebooks.

Synchronizes the project with the running instance. First time it happens automatically once you start an instance,
but you always can use this command to update the project if an instance is already running.
You can run custom scripts inside the Docker container using the `spotty run <SCRIPT_NAME>` command. Read more
about custom scripts in the documentation:
[Configuration File: "scripts" section](https://github.com/apls777/spotty/wiki/Configuration-File#scripts-section-optional).

To connect to the running container via SSH, use the following command:

```bash
$ spotty ssh
```

- `$ spotty create-ami`

Creates AMI with NVIDIA Docker. You need to call this command only one time when you start using Spotty, then you
can reuse created AMI for all your projects.

- `$ spotty delete-ami`

Deletes an AMI that was created using the command above.

- `$ spotty spot-prices [--instance-type <INSTANCE_TYPE>]`
It runs a [tmux](https://github.com/tmux/tmux/wiki) session, so you can always detach this session using
__`Crtl + b`__, then __`d`__ combination of keys. To be attached to that session later, just use the
`spotty ssh` command again.

Returns Spot Instance prices for particular instance type across all AWS regions. Results will be sorted by price.
## License

All the commands have parameter `--config` that can be used to specify a path to configuration file. By default it's
looking for a file `spotty.yaml` in the current working directory.
[MIT License](LICENSE)
2 changes: 1 addition & 1 deletion spotty/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '1.0.8'
__version__ = '1.1.0'
16 changes: 5 additions & 11 deletions spotty/commands/spot_prices.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from argparse import ArgumentParser
import boto3
import datetime
from spotty.commands.abstract import AbstractCommand
from spotty.helpers.resources import is_valid_instance_type
from spotty.commands.writers.abstract_output_writrer import AbstractOutputWriter
from spotty.helpers.spot_prices import get_spot_prices


class SpotPricesCommand(AbstractCommand):
Expand Down Expand Up @@ -35,21 +35,15 @@ def run(self, output: AbstractOutputWriter):
prices = []
for region in regions:
ec2 = boto3.client('ec2', region_name=region)

tomorrow_date = datetime.datetime.today() + datetime.timedelta(days=1)
res = ec2.describe_spot_price_history(
InstanceTypes=[instance_type],
StartTime=tomorrow_date,
ProductDescriptions=['Linux/UNIX'])

for row in res['SpotPriceHistory']:
prices.append((row['SpotPrice'], row['AvailabilityZone']))
res = get_spot_prices(ec2, instance_type)
prices += [(price, zone) for zone, price in res.items()]

# sort availability zones by price
prices.sort(key=lambda x: x[0])

if prices:
output.write('Price Zone')
for price, zone in prices:
output.write('%s %s' % (price, zone))
output.write('%.04f %s' % (price, zone))
else:
output.write('Spot instances of this type are not available.')
26 changes: 20 additions & 6 deletions spotty/commands/start.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from spotty.aws_cli import AwsCli
from spotty.commands.abstract_config import AbstractConfigCommand
from spotty.helpers.resources import wait_stack_status_changed
from spotty.helpers.spot_prices import get_current_spot_price
from spotty.helpers.validation import validate_instance_config
from spotty.project_resources.bucket import BucketResource
from spotty.project_resources.instance_profile import create_or_update_instance_profile
Expand Down Expand Up @@ -57,15 +58,25 @@ def run(self, output: AbstractOutputWriter):
# prepare CloudFormation template
output.write('Preparing CloudFormation template...')

# check availability zone
availability_zone = instance_config['availabilityZone']
if availability_zone:
zones = ec2.describe_availability_zones()
zone_names = [zone['ZoneName'] for zone in zones['AvailabilityZones']]
if availability_zone not in zone_names:
raise ValueError('Availability zone "%s" doesn\'t exist in the "%s" region.'
% (availability_zone, region))

instance_type = instance_config['instanceType']
volumes = instance_config['volumes']
ports = instance_config['ports']
max_price = instance_config['maxPrice']
docker_commands = instance_config['docker']['commands']

template = stack.prepare_template(ec2, volumes, ports, max_price, docker_commands, output)
template = stack.prepare_template(ec2, availability_zone, instance_type, volumes, ports, max_price,
docker_commands)

# create stack
instance_type = instance_config['instanceType']
ami_name = instance_config['amiName']
root_volume_size = instance_config['rootVolumeSize']
mount_dirs = [volume['directory'] for volume in volumes]
Expand All @@ -89,18 +100,21 @@ def run(self, output: AbstractOutputWriter):

if status == 'CREATE_COMPLETE':
ip_address = [row['OutputValue'] for row in info['Outputs'] if row['OutputKey'] == 'InstanceIpAddress'][0]
log_group = [row['OutputValue'] for row in info['Outputs'] if row['OutputKey'] == 'InstanceLogGroup'][0]
availability_zone = [row['OutputValue'] for row in info['Outputs']
if row['OutputKey'] == 'AvailabilityZone'][0]

# get the current spot price
current_price = get_current_spot_price(ec2, instance_type, availability_zone)

output.write('\n'
'--------------------\n'
'Instance is running.\n'
'\n'
'IP address: %s\n'
'CloudWatch Log Group:\n'
' %s\n'
'Current Spot price: $%.04f\n'
'\n'
'Use "spotty ssh" command to connect to the Docker container.\n'
'--------------------' % (ip_address, log_group))
'--------------------' % (ip_address, current_price))
else:
raise ValueError('Stack "%s" was not created.\n'
'Please, see CloudFormation and CloudWatch logs for the details.' % stack.name)
3 changes: 3 additions & 0 deletions spotty/data/run_container.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,7 @@ Resources:
mkdir -p $MOUNT_DIR
mount $DEVICE $MOUNT_DIR
chown -R ubuntu:ubuntu $MOUNT_DIR
resize2fs $DEVICE
done
commands:
mount_volumes:
Expand Down Expand Up @@ -551,3 +552,5 @@ Outputs:
Value: !GetAtt SpotInstance.PublicIp
InstanceLogGroup:
Value: !Ref InstanceLogGroup
AvailabilityZone:
Value: !GetAtt SpotInstance.AvailabilityZone
Loading

0 comments on commit 025caf0

Please sign in to comment.