Skip to content

feat: add support for vLLM response format in reranking logic and up… #5954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions core/llm/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1068,9 +1068,17 @@ export abstract class BaseLLM implements ILLM {
documents: chunks.map((chunk) => chunk.content),
});

// Put them in the order they were given
const sortedResults = results.data.sort((a, b) => a.index - b.index);
return sortedResults.map((result) => result.relevance_score);
// Standard OpenAI format
if (results.data && Array.isArray(results.data)) {
return results.data
.sort((a, b) => a.index - b.index)
.map((result) => result.relevance_score);
}

throw new Error(
`Unexpected rerank response format from ${this.providerName}. ` +
`Expected 'data' array but got: ${JSON.stringify(Object.keys(results))}`,
);
}

throw new Error(
Expand Down
42 changes: 41 additions & 1 deletion core/llm/llms/Vllm.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,25 @@
import { LLMOptions } from "../../index.js";
import { Chunk, LLMOptions } from "../../index.js";

import OpenAI from "./OpenAI.js";

// vLLM-specific rerank response types
interface VllmRerankItem {
index: number;
document: {
text: string;
};
relevance_score: number;
}

interface VllmRerankResponse {
id: string;
model: string;
usage: {
total_tokens: number;
};
results: VllmRerankItem[];
}

class Vllm extends OpenAI {
static providerName = "vllm";
constructor(options: LLMOptions) {
Expand All @@ -16,6 +34,28 @@ class Vllm extends OpenAI {
return false;
}

async rerank(query: string, chunks: Chunk[]): Promise<number[]> {
if (this.useOpenAIAdapterFor.includes("rerank") && this.openaiAdapter) {
const results = (await this.openaiAdapter.rerank({
model: this.model,
query,
documents: chunks.map((chunk) => chunk.content),
})) as unknown as VllmRerankResponse;

// vLLM uses 'results' array instead of 'data'
if (results.results && Array.isArray(results.results)) {
const sortedResults = results.results.sort((a, b) => a.index - b.index);
return sortedResults.map((result) => result.index);
}

throw new Error(
`vLLM rerank response missing 'results' array. Got: ${JSON.stringify(Object.keys(results))}`,
);
}

throw new Error("vLLM rerank requires OpenAI adapter");
}

private _setupCompletionOptions() {
this.fetch(this._getEndpoint("models"), {
method: "GET",
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/customize/model-providers/more/vllm.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
import TabItem from "@theme/TabItem";
import Tabs from "@theme/Tabs";

# vLLM

Expand Down Expand Up @@ -100,6 +98,8 @@ We recommend configuring **Nomic Embed Text** as your embeddings model.

## Reranking model

Continue automatically handles vLLM's response format (which uses `results` instead of `data`).

[Click here](../../model-roles/reranking.mdx) to see a list of reranking model providers.

The continue implementation uses [OpenAI](../top-level/openai.mdx) under the hood. [View the source](https://github.com/continuedev/continue/blob/main/core/llm/llms/Vllm.ts)
Loading