Skip to content

Conversation

@jiayin-nvidia
Copy link
Contributor

@jiayin-nvidia jiayin-nvidia commented Oct 17, 2025

What does this PR do?

  • Extend the model type to include rerank models.
  • Implement rerank() method in inference router.
  • Add construct_model_from_identifier to OpenAIMixin to allow customization of model typing/metadata.
  • Update documentation.

Test Plan

pytest tests/unit/providers/utils/inference/test_openai_mixin.py

- Embedding models: these models generate embeddings to be used for semantic
search.
- Rerank models (Experimental): these models reorder the documents based on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

Suggested change
- Rerank models (Experimental): these models reorder the documents based on
- Rerank models: these models reorder the documents based on

Copy link
Contributor Author

@jiayin-nvidia jiayin-nvidia Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added it since I noticed the rerank API doc is under https://github.com/llamastack/llama-stack/blob/main/docs/static/experimental-llama-stack-spec.html#L1417, but I can definitely remove it if it's not needed.

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes to apis/inference/inference.py and apis/model/model.py are great

the addition of rerank to the inference router is consistent and useful

the changes to openai_mixin are awkward because openai does not provide /rerank and nvidia is our only rerank impl. i understand the motivation to use as much standard code as possible in the nvidia adapter.

instead of introducing the concept of rerank into openai, what about giving the sub-class an opportunity to construct the model. lift the model construction code out of list_models so sub-classes can override it, e.g.

def construct_model_from_identifier(identifier: str) -> Model:
   if metadata := self.embedding_model_metadata...
   else:
      model = Model(...model_type=ModelType.llm)

then the nvidia provider can impl construct_model_from_identifier with something like -

def construct_model_from_identifier(identifier: str) -> Model:
   if identifier in self.rerank_model_list:
      return Model(...ModelType.rerank)
   return super().construct_model_from_identifier(identifier)

@jiayin-nvidia
Copy link
Contributor Author

the changes to apis/inference/inference.py and apis/model/model.py are great

the addition of rerank to the inference router is consistent and useful

the changes to openai_mixin are awkward because openai does not provide /rerank and nvidia is our only rerank impl. i understand the motivation to use as much standard code as possible in the nvidia adapter.

instead of introducing the concept of rerank into openai, what about giving the sub-class an opportunity to construct the model. lift the model construction code out of list_models so sub-classes can override it, e.g.

def construct_model_from_identifier(identifier: str) -> Model:
   if metadata := self.embedding_model_metadata...
   else:
      model = Model(...model_type=ModelType.llm)

then the nvidia provider can impl construct_model_from_identifier with something like -

def construct_model_from_identifier(identifier: str) -> Model:
   if identifier in self.rerank_model_list:
      return Model(...ModelType.rerank)
   return super().construct_model_from_identifier(identifier)

Hi @mattf , thank you for your suggestion, this sounds great. I will also update NVIDIA implementation accordingly in #3329.

@ashwinb ashwinb merged commit bb1ebb3 into llamastack:main Oct 22, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants