-
Couldn't load subscription status.
- Fork 1.2k
feat: Add rerank API for NVIDIA Inference Provider #3329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
02c50a0 to
da64ebc
Compare
|
Hi @ashwinb @franciscojavierarceo @mattf @ehhuang, the rerank integration tests for this PR needs the client-side rerank support (llamastack/llama-stack-client-python#271) to run successful. Could you please also review the client-side changes? I can also delete the rerank integration tests for now and add them in a separate PR after the client-side PR is merged and released. Please let me know which approach you prefer. Thank you! |
adc5504 to
2e6a0e0
Compare
8d57d15 to
aee59b3
Compare
|
Hi @franciscojavierarceo @ehhuang could you please also help to review llamastack/llama-stack-client-python#271? The rerank integration tests for this PR needs the client-side rerank support to run successful. Thank you! |
|
@jiayin-nvidia pre-commit failed, can you fix and update? |
a66ea3e to
997e192
Compare
|
Hi @franciscojavierarceo @ehhuang, checking if this PR is ready to merge. Please feel free to let me know if anything else is needed. Thanks! |
3ce9ec9 to
ac4ed26
Compare
|
@jiayin-nvidia can you resolve the conflicts please? |
a818012 to
6b49408
Compare
@franciscojavierarceo All conflicts are resolved and CIs are passing. Please let me know if anything else is needed or if it's ready to be merged. Thank you! |
b34f870 to
cdd8576
Compare
|
Hi @franciscojavierarceo @ehhuang , I updated the tests and documentation based on recent changes in the codebase. I also marked the rerank models as experimental. Please review the changes and let me know if anything else is needed. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments.
Could you also please split out the API change to another PR?
| "nvidia/nv-rerankqa-mistral-4b-v3", | ||
| "nvidia/llama-3.2-nv-rerankqa-1b-v2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was nvidia/ prefix added for the latter two but not the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model name does not have the nvidia/ prefix: https://build.nvidia.com/nvidia/rerank-qa-mistral-4b?snippet_tab=Python
| rerank_model_list: list[str] = [ | ||
| "nv-rerank-qa-mistral-4b:1", | ||
| "nvidia/nv-rerankqa-mistral-4b-v3", | ||
| "nvidia/llama-3.2-nv-rerankqa-1b-v2", | ||
| ] | ||
|
|
||
| _rerank_model_endpoints = { | ||
| "nv-rerank-qa-mistral-4b:1": "https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking", | ||
| "nvidia/nv-rerankqa-mistral-4b-v3": "https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking", | ||
| "nvidia/llama-3.2-nv-rerankqa-1b-v2": "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking", | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these should probably go into NVIDIAConfig with these values as default, so that new models can be supported without requiring code change and a new release? and we should just need a rerank_model_to_url field from which we can get the list of models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I've added rerank_model_to_url to NVIDIAConfig. I kept rerank_model_list in OpenAIMixin so that any provider using the OpenAIMixin can specify which models should be registered as rerank models. Please feel free to let me know if you have additional suggestions.
| metadata=metadata, | ||
| ) | ||
| elif provider_model_id in self.rerank_model_list: | ||
| # This is a rerank model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
cdd8576 to
5df6e6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments above
b66298a to
af100db
Compare
d04c13d to
b5d50ee
Compare
b5d50ee to
7ad4b47
Compare
|
Hi @franciscojavierarceo @ehhuang @ashwinb , I would like to follow up on this PR to see if anything is needed or if it is ready to be merged. Thank you! |
What does this PR do?
Add rerank API for NVIDIA Inference Provider.
Closes #3278
Test Plan
Unit test:
Integration test: