Skip to content

vector_io providers do not calculate scores correctly when cosine distance being usedย #3213

@jeremychoi

Description

@jeremychoi

System Info

N/A (I suspect the issue occurs regardless of the environment)

Information

  • The official example scripts
  • My own modified scripts

๐Ÿ› Describe the bug

When using pgvector with vector_io as an example, the value of results.score seems to be calculated incorrectly in the following code (https://github.com/llamastack/llama-stack/blob/main/llama_stack/providers/remote/vector_io/pgvector/pgvector.py#L136
) if dist represents a cosine distance rather than a Euclidean distance:

score = 1.0 / float(dist) if dist != 0 else float("inf")

As a result, the score can be greater than 1, which may cause thresholds to behave inaccurately.

For example, hereโ€™s a reproducer:

results = client.vector_io.query(
                vector_db_id=vector_db_id,
                query=query_text,
                params=query_params
            )
... snipped 
            for i, chunk in enumerate(results.chunks):
                score = results.scores[i]
                logging.info(f"Score: {score}")

This produces the following log:

INFO:root:Score: 1.3162429427090052

This issue may also affect other vector_io providers that use cosine distance.

Error logs

INFO:root:Score: 1.3162429427090052

Expected behavior

The score should be calculated as 1 - cosine_distance instead in case of cosine_distnace, and ranged between 0 to 1 properly if necessary with normalization.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions