From 70a3d96e6a483062e5229e4f6773c2266f4aab15 Mon Sep 17 00:00:00 2001 From: Hongkun Yu Date: Mon, 30 Mar 2020 11:09:27 -0700 Subject: [PATCH] Port multi host gpu training instructions. PiperOrigin-RevId: 303779613 --- official/vision/image_classification/README.md | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/official/vision/image_classification/README.md b/official/vision/image_classification/README.md index b5958cb0418..da5fcc1d5d3 100644 --- a/official/vision/image_classification/README.md +++ b/official/vision/image_classification/README.md @@ -29,11 +29,25 @@ provide a few options. Note: These models will **not** work with TPUs on Colab. You can train image classification models on Cloud TPUs using -`tf.distribute.TPUStrategy`. If you are not familiar with Cloud TPUs, it is -strongly recommended that you go through the +[tf.distribute.experimental.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/TPUStrategy?version=nightly). +If you are not familiar with Cloud TPUs, it is strongly recommended that you go +through the [quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to create a TPU and GCE VM. +### Running on multiple GPU hosts + +You can also train these models on multiple hosts, each with GPUs, using +[tf.distribute.experimental.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy). + +The easiest way to run multi-host benchmarks is to set the +[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG) +appropriately at each host. e.g., to run using `MultiWorkerMirroredStrategy` on +2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and +host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker", +"index": i}`. `MultiWorkerMirroredStrategy` will automatically use all the +available GPUs at each host. + ## MNIST To download the data and run the MNIST sample model locally for the first time,