feat: Add rerank API for NVIDIA Inference Provider #3329

jiayin-nvidia · 2025-09-04T05:10:26Z

What does this PR do?

Add rerank API for NVIDIA Inference Provider.

Test Plan

Unit test:

pytest tests/unit/providers/nvidia/test_rerank_inference.py

Integration test:

pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"

jiayin-nvidia · 2025-09-09T20:43:50Z

Hi @ashwinb @franciscojavierarceo @mattf @ehhuang, the rerank integration tests for this PR needs the client-side rerank support (llamastack/llama-stack-client-python#271) to run successful. Could you please also review the client-side changes? I can also delete the rerank integration tests for now and add them in a separate PR after the client-side PR is merged and released. Please let me know which approach you prefer. Thank you!

docs/source/providers/agents/index.md

docs/source/providers/batches/index.md

docs/source/providers/inference/index.md

llama_stack/apis/inference/inference.py

llama_stack/providers/remote/inference/nvidia/nvidia.py

tests/integration/fixtures/common.py

tests/unit/providers/nvidia/test_rerank_inference.py

docs/source/providers/inference/index.md

tests/integration/inference/test_rerank.py

jiayin-nvidia · 2025-09-17T18:50:10Z

Hi @franciscojavierarceo @ehhuang could you please also help to review llamastack/llama-stack-client-python#271? The rerank integration tests for this PR needs the client-side rerank support to run successful. Thank you!

franciscojavierarceo · 2025-09-17T18:52:35Z

@jiayin-nvidia pre-commit failed, can you fix and update?

jiayin-nvidia · 2025-10-01T17:45:47Z

Hi @franciscojavierarceo @ehhuang, checking if this PR is ready to merge. Please feel free to let me know if anything else is needed. Thanks!

franciscojavierarceo · 2025-10-03T15:00:28Z

@jiayin-nvidia can you resolve the conflicts please?

jiayin-nvidia · 2025-10-03T18:10:26Z

@jiayin-nvidia can you resolve the conflicts please?

@franciscojavierarceo All conflicts are resolved and CIs are passing. Please let me know if anything else is needed or if it's ready to be merged. Thank you!

jiayin-nvidia · 2025-10-13T02:04:29Z

Hi @franciscojavierarceo @ehhuang , I updated the tests and documentation based on recent changes in the codebase. I also marked the rerank models as experimental. Please review the changes and let me know if anything else is needed. Thanks!

ehhuang

Added some comments.

Could you also please split out the API change to another PR?

ehhuang · 2025-10-14T18:38:21Z

llama_stack/providers/remote/inference/nvidia/nvidia.py

+        "nvidia/nv-rerankqa-mistral-4b-v3",
+        "nvidia/llama-3.2-nv-rerankqa-1b-v2",


why was nvidia/ prefix added for the latter two but not the first?

The model name does not have the nvidia/ prefix: https://build.nvidia.com/nvidia/rerank-qa-mistral-4b?snippet_tab=Python

ehhuang · 2025-10-14T18:40:13Z

llama_stack/providers/remote/inference/nvidia/nvidia.py

+    rerank_model_list: list[str] = [
+        "nv-rerank-qa-mistral-4b:1",
+        "nvidia/nv-rerankqa-mistral-4b-v3",
+        "nvidia/llama-3.2-nv-rerankqa-1b-v2",
+    ]
+
+    _rerank_model_endpoints = {
+        "nv-rerank-qa-mistral-4b:1": "https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking",
+        "nvidia/nv-rerankqa-mistral-4b-v3": "https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking",
+        "nvidia/llama-3.2-nv-rerankqa-1b-v2": "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking",
+    }
+


all of these should probably go into NVIDIAConfig with these values as default, so that new models can be supported without requiring code change and a new release? and we should just need a rerank_model_to_url field from which we can get the list of models.

Good point! I've added rerank_model_to_url to NVIDIAConfig. I kept rerank_model_list in OpenAIMixin so that any provider using the OpenAIMixin can specify which models should be registered as rerank models. Please feel free to let me know if you have additional suggestions.

ehhuang · 2025-10-14T18:43:03Z

llama_stack/providers/utils/inference/openai_mixin.py

                    metadata=metadata,
                )
+            elif provider_model_id in self.rerank_model_list:
+                # This is a rerank model


ehhuang

comments above

jiayin-nvidia · 2025-10-17T00:35:51Z

comments above

@ehhuang Thank you for your review. I moved the API change to another PR: #3831. Please review it and I will rebase this PR on top of it once it's merged.

jiayin-nvidia · 2025-10-23T03:45:02Z

Hi @ehhuang, #3831 has been merged into main and I rebased this PR on top of it. Please let me know if there is anything else needed. Thank you!

jiayin-nvidia · 2025-10-27T21:53:11Z

Hi @franciscojavierarceo @ehhuang @ashwinb , I would like to follow up on this PR to see if anything is needed or if it is ready to be merged. Thank you!

jiayin-nvidia requested review from ashwinb, bbrowning, ehhuang, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners September 4, 2025 05:10

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 4, 2025

jiayin-nvidia marked this pull request as draft September 4, 2025 05:10

jiayin-nvidia force-pushed the support_rerank branch from 02c50a0 to da64ebc Compare September 5, 2025 01:18

jiayin-nvidia marked this pull request as ready for review September 5, 2025 23:54