GPU selection specific to embedding models #396

thakurvivek · 2025-11-15T03:57:31Z

Is your pull request related to a problem? Please describe.
No ability to choose GPU while loading embedding models

Why should this feature be added?
In multi-GPU environments, TabbyAPI currently lacks the ability to specify which GPU device should be used for embedding models. This creates significant resource management challenges:

Current Limitation: Embedding models automatically load on default GPU (typically GPU 0)
Resource Conflict: Cannot distribute embedding workloads across available GPUs
Operational Inefficiency: Forces users to manually manage GPU allocation via environment variables

Examples
In a multi-GPU cluster setup:

GPU 3090: Running large language models (high memory usage)
GPU 5090: Dedicated for embedding models (currently underutilized)
Current TabbyAPI instance: Cannot target GPU 5090 for embeddings without restarting

Additional context
Added embeddings_device_id configuration parameter to enable specific GPU selection

thakurvivek added 2 commits November 15, 2025 08:26

GPU selection for embedding models working

577750a

Embedding gpu selection edge cases

9297cde

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GPU selection specific to embedding models #396

GPU selection specific to embedding models #396

Uh oh!

thakurvivek commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

GPU selection specific to embedding models #396

Are you sure you want to change the base?

GPU selection specific to embedding models #396

Uh oh!

Conversation

thakurvivek commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant