Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOUNTY - $300] Support Multi-GPU #223

Open
AlexCheema opened this issue Sep 19, 2024 · 7 comments
Open

[BOUNTY - $300] Support Multi-GPU #223

AlexCheema opened this issue Sep 19, 2024 · 7 comments

Comments

@AlexCheema
Copy link
Contributor

Currently you can only run one exo instance on each device.

There are some design decisions here:

  • Should we support running multiple exo instances on the same device, with one per GPU
  • Or should we support running one exo instance that uses multiple GPUs
@AlexCheema AlexCheema changed the title Support Multi-GPU [BOUNTY - $300] Support Multi-GPU Sep 19, 2024
@Sean-fn
Copy link
Contributor

Sean-fn commented Oct 15, 2024

I agree with supporting one exo instance that uses multiple GPUs.
This approach would allow us to shard more when only one model is inferencing.
What do you think?

@cmcmaster1
Copy link

Chiming in as a new user with a multi-GPU setup. One instance is easiest. Users can simply control GPU selection with the CUDA_VISIBLE_DEVICES environment variable.

@jorge123255
Copy link

i was able to do this, I forked to github and added configurations to the integration.py and integration_engine.py

@jorge123255
Copy link

the only issue is trying to have it show up on the exo console page to show 2 gpus instead of one, still testing.

@benjamin-asdf
Copy link

benjamin-asdf commented Nov 25, 2024

Multiple instances per device, assign a gpu to each. (approach 1)

pros:

  • levarage all orchestration and model splitting functionality, ideally you figure out how to parallelize layers (and only once)
  • aesthetics: uses the primitive exo functionality on a different scale
  • this approach seems to scale with different kinds of topologies (not even known ones)

cons:

  • node communication overhead?

single instance per deivce, assign multiple gpus (approach 2):

aspects:

  • nodes have to broadcast the sum of their multi-gpu setup RAM, and nodes have to internally handle mutli-gpu.

pros:

  • don't have to deal with overlapping system resources (ports, file locks, etc.)
  • inference engines already support multi gpu? (but exo does, too - across devices)

cons:

  • 2 ways of doing multi gpu
  • composability?

Case 1:
Multigpu (approach 2) is very easy to do.
In that case one might go with approach 2, for now and keep approach 1 in mind for later.

Case 2:
Multigpu is not easy to do, approach 1 and 2 are roughly the same effort.
In that case, I would go with approach 1.

@freerainboxbox
Copy link

I implemented a temporary workaround using approach 2 in #656.

@AlexCheema
Copy link
Contributor Author

I implemented a temporary workaround using approach 2 in #656.

I suppose this isn't a full solution for multi-gpu, it's just a wrapper on VISIBLE_DEVICES.
This will be supported fully in the rearchitect I'm working on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants