-
Couldn't load subscription status.
- Fork 1.2k
feat: Add rerank models and rerank API change #3831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
350f424 to
c9d222e
Compare
docs/static/llama-stack-spec.yaml
Outdated
| - Embedding models: these models generate embeddings to be used for semantic | ||
| search. | ||
| - Rerank models (Experimental): these models reorder the documents based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this?
| - Rerank models (Experimental): these models reorder the documents based on | |
| - Rerank models: these models reorder the documents based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially added it since I noticed the rerank API doc is under https://github.com/llamastack/llama-stack/blob/main/docs/static/experimental-llama-stack-spec.html#L1417, but I can definitely remove it if it's not needed.
c2c0e53 to
ad52849
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes to apis/inference/inference.py and apis/model/model.py are great
the addition of rerank to the inference router is consistent and useful
the changes to openai_mixin are awkward because openai does not provide /rerank and nvidia is our only rerank impl. i understand the motivation to use as much standard code as possible in the nvidia adapter.
instead of introducing the concept of rerank into openai, what about giving the sub-class an opportunity to construct the model. lift the model construction code out of list_models so sub-classes can override it, e.g.
def construct_model_from_identifier(identifier: str) -> Model:
if metadata := self.embedding_model_metadata...
else:
model = Model(...model_type=ModelType.llm)
then the nvidia provider can impl construct_model_from_identifier with something like -
def construct_model_from_identifier(identifier: str) -> Model:
if identifier in self.rerank_model_list:
return Model(...ModelType.rerank)
return super().construct_model_from_identifier(identifier)
Hi @mattf , thank you for your suggestion, this sounds great. I will also update NVIDIA implementation accordingly in #3329. |
What does this PR do?
rerank()method in inference router.construct_model_from_identifiertoOpenAIMixinto allow customization of model typing/metadata.Test Plan