You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* enable multinode ray
* update
* pass by ref
* update placement groups
* move to cpu, have only rank0 return and ray.get on all workers
* remove old ParallelPredictUnit
* update
* reduce test size
* update
* remove code cell
UMA supports Graph Parallel inference natively. The graph is chunked into each rank and both the forward and backwards communication is handled by the built-in graph parallel algorithm with torch distributed. Because Multi-GPU inference requires special setup of communication protocols within a node and across nodes, we use a client-server architecture for maximum flexibility and scaling into large scale parallelism. We use a light-weight websocket [client](https://github.com/facebookresearch/fairchem/blob/main/src/fairchem/core/units/mlip_unit/inference/client_websocket.py#L33), and a websocket [server](https://github.com/facebookresearch/fairchem/blob/main/src/fairchem/core/units/mlip_unit/inference/inference_server_ray.py) that then uses[ray](https://www.ray.io/) to launch Ray Actors for each GPU-rank under the hood. This allows us to seemlessly scale to any infrastructure that can run Ray.
107
+
UMA supports Graph Parallel inference natively. The graph is chunked into each rank and both the forward and backwards communication is handled by the built-in graph parallel algorithm with torch distributed. Because Multi-GPU inference requires special setup of communication protocols within a node and across nodes, we leverage[ray](https://www.ray.io/) to launch Ray Actors for each GPU-rank under the hood. This allows us to seemlessly scale to any infrastructure that can run Ray.
108
108
109
-
To make things simple for the user that wants to run multi-gpu inference locally, we provide a drop-in replacement for MLIPPredictUnit, called [ParallelMLIPPredictUnit](https://github.com/facebookresearch/fairchem/blob/cb1b95fffe8a5bc0276203c13ecd222244b8e7b6/src/fairchem/core/units/mlip_unit/predict.py#L311)
109
+
To make things simple for the user that wants to run multi-gpu inference locally, we provide a drop-in replacement for MLIPPredictUnit, called [ParallelMLIPPredictUnitRay](https://github.com/facebookresearch/fairchem/blob/cb1b95fffe8a5bc0276203c13ecd222244b8e7b6/src/fairchem/core/units/mlip_unit/predict.py)
110
110
111
111
For example, we can create a predictor with 8 GPU workers in a very similiar way to MLIPPredictUnit:
112
112
113
-
```{code-cell} python3
113
+
```
114
114
from fairchem.core.calculate.pretrained_mlip import pretrained_checkpoint_path_from_name
115
115
from fairchem.core.units.mlip_unit.api.inference import InferenceSettings
116
-
from fairchem.core.units.mlip_unit.predict import ParallelMLIPPredictUnit
116
+
from fairchem.core.units.mlip_unit.predict import ParallelMLIPPredictUnitRay
This will automatically create a Ray server on your local machine and use a local client to connect to it. You can also easily manually create a [server](https://github.com/facebookresearch/fairchem/blob/main/src/fairchem/core/units/mlip_unit/inference/inference_server_ray.py) running elsewhere (for example on a very large GPU cluster) and then use a separate client to connect to it.
135
+
This will automatically create a Ray server on your local machine and use a local client to connect to it. If you have setup a Ray cluster, you can leverage it to run parallel inference on as many nodes as you like. We are actively working on optimziations to scale inference to large systems.
0 commit comments