Skip to content

Conversation

@jiayin-nvidia
Copy link
Contributor

@jiayin-nvidia jiayin-nvidia commented Sep 4, 2025

What does this PR do?

Add rerank API for NVIDIA Inference Provider.

Closes #3278

Test Plan

Unit test:

pytest tests/unit/providers/nvidia/test_rerank_inference.py

Integration test:

pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 4, 2025
@jiayin-nvidia jiayin-nvidia marked this pull request as draft September 4, 2025 05:10
@jiayin-nvidia jiayin-nvidia marked this pull request as ready for review September 5, 2025 23:54
@jiayin-nvidia
Copy link
Contributor Author

Hi @ashwinb @franciscojavierarceo @mattf @ehhuang, the rerank integration tests for this PR needs the client-side rerank support (llamastack/llama-stack-client-python#271) to run successful. Could you please also review the client-side changes? I can also delete the rerank integration tests for now and add them in a separate PR after the client-side PR is merged and released. Please let me know which approach you prefer. Thank you!

@jiayin-nvidia
Copy link
Contributor Author

Hi @franciscojavierarceo @ehhuang could you please also help to review llamastack/llama-stack-client-python#271? The rerank integration tests for this PR needs the client-side rerank support to run successful. Thank you!

@franciscojavierarceo
Copy link
Collaborator

@jiayin-nvidia pre-commit failed, can you fix and update?

@jiayin-nvidia
Copy link
Contributor Author

Hi @franciscojavierarceo @ehhuang, checking if this PR is ready to merge. Please feel free to let me know if anything else is needed. Thanks!

@jiayin-nvidia jiayin-nvidia force-pushed the support_rerank branch 2 times, most recently from 3ce9ec9 to ac4ed26 Compare October 2, 2025 20:22
@franciscojavierarceo
Copy link
Collaborator

@jiayin-nvidia can you resolve the conflicts please?

@jiayin-nvidia jiayin-nvidia force-pushed the support_rerank branch 3 times, most recently from a818012 to 6b49408 Compare October 3, 2025 17:42
@jiayin-nvidia
Copy link
Contributor Author

@jiayin-nvidia can you resolve the conflicts please?

@franciscojavierarceo All conflicts are resolved and CIs are passing. Please let me know if anything else is needed or if it's ready to be merged. Thank you!

@jiayin-nvidia jiayin-nvidia force-pushed the support_rerank branch 6 times, most recently from b34f870 to cdd8576 Compare October 13, 2025 01:59
@jiayin-nvidia
Copy link
Contributor Author

jiayin-nvidia commented Oct 13, 2025

Hi @franciscojavierarceo @ehhuang , I updated the tests and documentation based on recent changes in the codebase. I also marked the rerank models as experimental. Please review the changes and let me know if anything else is needed. Thanks!

Copy link
Contributor

@ehhuang ehhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

Could you also please split out the API change to another PR?

Comment on lines 58 to 55
"nvidia/nv-rerankqa-mistral-4b-v3",
"nvidia/llama-3.2-nv-rerankqa-1b-v2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was nvidia/ prefix added for the latter two but not the first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model name does not have the nvidia/ prefix: https://build.nvidia.com/nvidia/rerank-qa-mistral-4b?snippet_tab=Python

Comment on lines 56 to 63
rerank_model_list: list[str] = [
"nv-rerank-qa-mistral-4b:1",
"nvidia/nv-rerankqa-mistral-4b-v3",
"nvidia/llama-3.2-nv-rerankqa-1b-v2",
]

_rerank_model_endpoints = {
"nv-rerank-qa-mistral-4b:1": "https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking",
"nvidia/nv-rerankqa-mistral-4b-v3": "https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking",
"nvidia/llama-3.2-nv-rerankqa-1b-v2": "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking",
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these should probably go into NVIDIAConfig with these values as default, so that new models can be supported without requiring code change and a new release? and we should just need a rerank_model_to_url field from which we can get the list of models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've added rerank_model_to_url to NVIDIAConfig. I kept rerank_model_list in OpenAIMixin so that any provider using the OpenAIMixin can specify which models should be registered as rerank models. Please feel free to let me know if you have additional suggestions.

metadata=metadata,
)
elif provider_model_id in self.rerank_model_list:
# This is a rerank model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor

@ehhuang ehhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments above

@jiayin-nvidia
Copy link
Contributor Author

comments above

@ehhuang Thank you for your review. I moved the API change to another PR: #3831. Please review it and I will rebase this PR on top of it once it's merged.

@jiayin-nvidia jiayin-nvidia force-pushed the support_rerank branch 2 times, most recently from b66298a to af100db Compare October 17, 2025 00:42
@jiayin-nvidia jiayin-nvidia requested a review from ehhuang October 17, 2025 20:25
@jiayin-nvidia jiayin-nvidia force-pushed the support_rerank branch 2 times, most recently from d04c13d to b5d50ee Compare October 23, 2025 03:43
@jiayin-nvidia
Copy link
Contributor Author

Hi @ehhuang, #3831 has been merged into main and I rebased this PR on top of it. Please let me know if there is anything else needed. Thank you!

@jiayin-nvidia
Copy link
Contributor Author

jiayin-nvidia commented Oct 27, 2025

Hi @franciscojavierarceo @ehhuang @ashwinb , I would like to follow up on this PR to see if anything is needed or if it is ready to be merged. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for /rerank API for NVIDIA Inference Provider

3 participants