-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
System Info
N/A (I suspect the issue occurs regardless of the environment)
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
When using pgvector with vector_io as an example, the value of results.score seems to be calculated incorrectly in the following code (https://github.com/llamastack/llama-stack/blob/main/llama_stack/providers/remote/vector_io/pgvector/pgvector.py#L136
) if dist represents a cosine distance rather than a Euclidean distance:
score = 1.0 / float(dist) if dist != 0 else float("inf")
As a result, the score can be greater than 1, which may cause thresholds to behave inaccurately.
For example, here’s a reproducer:
results = client.vector_io.query(
vector_db_id=vector_db_id,
query=query_text,
params=query_params
)
... snipped
for i, chunk in enumerate(results.chunks):
score = results.scores[i]
logging.info(f"Score: {score}")
This produces the following log:
INFO:root:Score: 1.3162429427090052
This issue may also affect other vector_io providers that use cosine distance.
Error logs
INFO:root:Score: 1.3162429427090052
Expected behavior
The score should be calculated as 1 - cosine_distance
instead in case of cosine_distnace, and ranged between 0 to 1 properly if necessary with normalization.