Skip to content

Commit 85bd835

Browse files
authored
Enable multinode Ray (#1526)
* enable multinode ray * update * pass by ref * update placement groups * move to cpu, have only rank0 return and ray.get on all workers * remove old ParallelPredictUnit * update * reduce test size * update * remove code cell
1 parent 4bdedc3 commit 85bd835

9 files changed

Lines changed: 230 additions & 446 deletions

File tree

docs/core/common_tasks/ase_calculator.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -104,16 +104,16 @@ predictor = pretrained_mlip.get_predict_unit(
104104

105105
## Multi-GPU Inference
106106

107-
UMA supports Graph Parallel inference natively. The graph is chunked into each rank and both the forward and backwards communication is handled by the built-in graph parallel algorithm with torch distributed. Because Multi-GPU inference requires special setup of communication protocols within a node and across nodes, we use a client-server architecture for maximum flexibility and scaling into large scale parallelism. We use a light-weight websocket [client](https://github.com/facebookresearch/fairchem/blob/main/src/fairchem/core/units/mlip_unit/inference/client_websocket.py#L33), and a websocket [server](https://github.com/facebookresearch/fairchem/blob/main/src/fairchem/core/units/mlip_unit/inference/inference_server_ray.py) that then uses [ray](https://www.ray.io/) to launch Ray Actors for each GPU-rank under the hood. This allows us to seemlessly scale to any infrastructure that can run Ray.
107+
UMA supports Graph Parallel inference natively. The graph is chunked into each rank and both the forward and backwards communication is handled by the built-in graph parallel algorithm with torch distributed. Because Multi-GPU inference requires special setup of communication protocols within a node and across nodes, we leverage [ray](https://www.ray.io/) to launch Ray Actors for each GPU-rank under the hood. This allows us to seemlessly scale to any infrastructure that can run Ray.
108108

109-
To make things simple for the user that wants to run multi-gpu inference locally, we provide a drop-in replacement for MLIPPredictUnit, called [ParallelMLIPPredictUnit](https://github.com/facebookresearch/fairchem/blob/cb1b95fffe8a5bc0276203c13ecd222244b8e7b6/src/fairchem/core/units/mlip_unit/predict.py#L311)
109+
To make things simple for the user that wants to run multi-gpu inference locally, we provide a drop-in replacement for MLIPPredictUnit, called [ParallelMLIPPredictUnitRay](https://github.com/facebookresearch/fairchem/blob/cb1b95fffe8a5bc0276203c13ecd222244b8e7b6/src/fairchem/core/units/mlip_unit/predict.py)
110110

111111
For example, we can create a predictor with 8 GPU workers in a very similiar way to MLIPPredictUnit:
112112

113-
```{code-cell} python3
113+
```
114114
from fairchem.core.calculate.pretrained_mlip import pretrained_checkpoint_path_from_name
115115
from fairchem.core.units.mlip_unit.api.inference import InferenceSettings
116-
from fairchem.core.units.mlip_unit.predict import ParallelMLIPPredictUnit
116+
from fairchem.core.units.mlip_unit.predict import ParallelMLIPPredictUnitRay
117117
118118
inference_settings = InferenceSettings(
119119
tf32=True,
@@ -124,12 +124,12 @@ inference_settings = InferenceSettings(
124124
external_graph_gen=False,
125125
)
126126
127-
predictor = ParallelMLIPPredictUnit(
127+
predictor = ParallelMLIPPredictUnitRay(
128128
inference_model_path=pretrained_checkpoint_path_from_name("uma-s-1p1"),
129129
device="cuda",
130130
inference_settings=inference_settings,
131-
server_config={"workers": 8},
131+
num_workers=8,
132132
)
133133
```
134134

135-
This will automatically create a Ray server on your local machine and use a local client to connect to it. You can also easily manually create a [server](https://github.com/facebookresearch/fairchem/blob/main/src/fairchem/core/units/mlip_unit/inference/inference_server_ray.py) running elsewhere (for example on a very large GPU cluster) and then use a separate client to connect to it.
135+
This will automatically create a Ray server on your local machine and use a local client to connect to it. If you have setup a Ray cluster, you can leverage it to run parallel inference on as many nodes as you like. We are actively working on optimziations to scale inference to large systems.

src/fairchem/core/units/mlip_unit/inference/client_server_example.py

Lines changed: 0 additions & 56 deletions
This file was deleted.

src/fairchem/core/units/mlip_unit/inference/client_websocket.py

Lines changed: 0 additions & 60 deletions
This file was deleted.

src/fairchem/core/units/mlip_unit/inference/inference_server_ray.py

Lines changed: 0 additions & 176 deletions
This file was deleted.

src/fairchem/core/units/mlip_unit/inference/server_config.yaml

Lines changed: 0 additions & 18 deletions
This file was deleted.

0 commit comments

Comments
 (0)