feat: Add rerank models and rerank API change #3831

jiayin-nvidia · 2025-10-17T00:32:48Z

What does this PR do?

Extend the model type to include rerank models.
Implement rerank() method in inference router.
Add construct_model_from_identifier to OpenAIMixin to allow customization of model typing/metadata.
Update documentation.

Test Plan

pytest tests/unit/providers/utils/inference/test_openai_mixin.py

ehhuang · 2025-10-17T21:24:05Z

docs/static/llama-stack-spec.yaml

      - Embedding models: these models generate embeddings to be used for semantic
      search.
+
+      - Rerank models (Experimental): these models reorder the documents based on


do we need this?

Suggested change

- Rerank models (Experimental): these models reorder the documents based on

- Rerank models: these models reorder the documents based on

I initially added it since I noticed the rerank API doc is under https://github.com/llamastack/llama-stack/blob/main/docs/static/experimental-llama-stack-spec.html#L1417, but I can definitely remove it if it's not needed.

mattf

the changes to apis/inference/inference.py and apis/model/model.py are great

the addition of rerank to the inference router is consistent and useful

the changes to openai_mixin are awkward because openai does not provide /rerank and nvidia is our only rerank impl. i understand the motivation to use as much standard code as possible in the nvidia adapter.

instead of introducing the concept of rerank into openai, what about giving the sub-class an opportunity to construct the model. lift the model construction code out of list_models so sub-classes can override it, e.g.

def construct_model_from_identifier(identifier: str) -> Model:
   if metadata := self.embedding_model_metadata...
   else:
      model = Model(...model_type=ModelType.llm)

then the nvidia provider can impl construct_model_from_identifier with something like -

def construct_model_from_identifier(identifier: str) -> Model:
   if identifier in self.rerank_model_list:
      return Model(...ModelType.rerank)
   return super().construct_model_from_identifier(identifier)

jiayin-nvidia · 2025-10-20T17:10:14Z

the changes to apis/inference/inference.py and apis/model/model.py are great

the addition of rerank to the inference router is consistent and useful

the changes to openai_mixin are awkward because openai does not provide /rerank and nvidia is our only rerank impl. i understand the motivation to use as much standard code as possible in the nvidia adapter.

instead of introducing the concept of rerank into openai, what about giving the sub-class an opportunity to construct the model. lift the model construction code out of list_models so sub-classes can override it, e.g.
def construct_model_from_identifier(identifier: str) -> Model:
   if metadata := self.embedding_model_metadata...
   else:
      model = Model(...model_type=ModelType.llm)
then the nvidia provider can impl construct_model_from_identifier with something like -
def construct_model_from_identifier(identifier: str) -> Model:
   if identifier in self.rerank_model_list:
      return Model(...ModelType.rerank)
   return super().construct_model_from_identifier(identifier)

Hi @mattf , thank you for your suggestion, this sounds great. I will also update NVIDIA implementation accordingly in #3329.

jiayin-nvidia requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 17, 2025 00:32

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 17, 2025

jiayin-nvidia mentioned this pull request Oct 17, 2025

feat: Add rerank API for NVIDIA Inference Provider #3329

Open

jiayin-nvidia force-pushed the rerank_api_change branch from 350f424 to c9d222e Compare October 17, 2025 00:50

ehhuang reviewed Oct 17, 2025

View reviewed changes

jiayin-nvidia added 2 commits October 19, 2025 15:42

Add rerank models and rerank API change

51c923f

Remove experimental from rerank models doc

ad52849

jiayin-nvidia force-pushed the rerank_api_change branch from c2c0e53 to ad52849 Compare October 19, 2025 22:42

mattf reviewed Oct 20, 2025

View reviewed changes

Add construct_model_from_identifier to OpenAIMixin

751544d

jiayin-nvidia requested review from ehhuang and mattf October 21, 2025 16:45

mattf approved these changes Oct 22, 2025

View reviewed changes

ashwinb merged commit bb1ebb3 into llamastack:main Oct 22, 2025
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add rerank models and rerank API change #3831

feat: Add rerank models and rerank API change #3831

Uh oh!

jiayin-nvidia commented Oct 17, 2025 •

edited

Loading

Uh oh!

ehhuang Oct 17, 2025

Uh oh!

jiayin-nvidia Oct 17, 2025 •

edited

Loading

Uh oh!

mattf left a comment •

edited

Loading

Uh oh!

jiayin-nvidia commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	- Rerank models (Experimental): these models reorder the documents based on
	- Rerank models: these models reorder the documents based on

Uh oh!

feat: Add rerank models and rerank API change #3831

feat: Add rerank models and rerank API change #3831

Uh oh!

Conversation

jiayin-nvidia commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

ehhuang Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

jiayin-nvidia Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattf left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiayin-nvidia commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jiayin-nvidia commented Oct 17, 2025 •

edited

Loading

jiayin-nvidia Oct 17, 2025 •

edited

Loading

mattf left a comment •

edited

Loading