Skip to content

Conversation

@thakurvivek
Copy link

Is your pull request related to a problem? Please describe.
No ability to choose GPU while loading embedding models

Why should this feature be added?
In multi-GPU environments, TabbyAPI currently lacks the ability to specify which GPU device should be used for embedding models. This creates significant resource management challenges:

  • Current Limitation: Embedding models automatically load on default GPU (typically GPU 0)
  • Resource Conflict: Cannot distribute embedding workloads across available GPUs
  • Operational Inefficiency: Forces users to manually manage GPU allocation via environment variables

Examples
In a multi-GPU cluster setup:

  • GPU 3090: Running large language models (high memory usage)
  • GPU 5090: Dedicated for embedding models (currently underutilized)
  • Current TabbyAPI instance: Cannot target GPU 5090 for embeddings without restarting

Additional context
Added embeddings_device_id configuration parameter to enable specific GPU selection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant