-
-
Notifications
You must be signed in to change notification settings - Fork 112
[Store] Add PostgresHybridStore with RRF following Supabase approach #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Combines pgvector semantic search with PostgreSQL Full-Text Search using Reciprocal Rank Fusion (RRF), following Supabase approach. Features: - Configurable semantic/keyword ratio (0.0 to 1.0) - RRF fusion with customizable k parameter - Multilingual FTS support (default: 'simple') - Optional relevance filtering with defaultMaxScore - All pgvector distance metrics supported
1284fcf to
6c7c7e3
Compare
3807878 to
8d4ccfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces PostgresHybridStore, a new vector store implementation that combines semantic vector search (pgvector) with PostgreSQL Full-Text Search (FTS) using Reciprocal Rank Fusion (RRF), following Supabase's hybrid search approach.
Key changes:
- Implements configurable hybrid search with adjustable semantic ratio (0.0 for pure FTS, 1.0 for pure vector, 0.5 for balanced)
- Uses RRF algorithm with k=60 default to merge vector similarity and ts_rank_cd rankings
- Supports multilingual content through configurable PostgreSQL text search configurations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/store/src/Bridge/Postgres/PostgresHybridStore.php | Core implementation of hybrid store with vector/FTS query building, RRF fusion logic, and table setup with tsvector generation |
| src/store/tests/Bridge/Postgres/PostgresHybridStoreTest.php | Comprehensive test coverage for constructor validation, setup, pure vector/FTS queries, hybrid RRF queries, and various configuration options |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Combine existing WHERE clause with FTS filter | ||
| $whereClause = str_replace('WHERE ', "WHERE $ftsFilter AND ", $whereClause); |
Copilot
AI
Oct 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using str_replace to inject the FTS filter is fragile and could fail if the WHERE clause contains the substring 'WHERE ' in other contexts (e.g., in a string literal or subquery). Consider using a more robust approach such as parsing the clause structure or prepending the filter condition with proper AND logic.
| // Combine existing WHERE clause with FTS filter | |
| $whereClause = str_replace('WHERE ', "WHERE $ftsFilter AND ", $whereClause); | |
| // Combine existing WHERE clause with FTS filter using AND logic | |
| // Assumes $whereClause starts with 'WHERE ' | |
| $whereClause = rtrim($whereClause); | |
| if (str_starts_with($whereClause, 'WHERE ')) { | |
| $whereClause .= " AND $ftsFilter"; | |
| } else { | |
| // Unexpected format, prepend WHERE | |
| $whereClause = "WHERE $ftsFilter AND " . ltrim($whereClause); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this caught my eyes as well and i was wondering if the where building can be brought into one single place instead of spreading it over three methods - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I've centralized the WHERE clause building logic into a dedicated addFilterToWhereClause() helper method. This removes the duplication across buildFtsOnlyQuery()
and buildHybridQuery(), and also makes the str_replace approach more robust by using str_starts_with() instead.
| $ftsFilter = \sprintf("content_tsv @@ websearch_to_tsquery('%s', :query)", $this->language); | ||
|
|
||
| if ('' !== $whereClause) { | ||
| $ftsWhereClause = str_replace('WHERE ', "WHERE $ftsFilter AND ", $whereClause); |
Copilot
AI
Oct 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as in buildFtsOnlyQuery: using str_replace to inject the FTS filter is fragile and could produce incorrect SQL if 'WHERE ' appears in unexpected contexts. Consider a more robust approach to combining WHERE conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this is a super cool feature - some copilot findings seem valid to me - please check.
On top, I was unsure if all sprintf need to be sprintf or some values can/should be a prepared parameter - that'd be great to double check as well please.
| * | ||
| * @author Ahmed EBEN HASSINE <ahmedbhs123@æmail.com> | ||
| */ | ||
| final readonly class PostgresHybridStore implements ManagedStoreInterface, StoreInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just call it HybridStore instead
| final readonly class PostgresHybridStore implements ManagedStoreInterface, StoreInterface | |
| final readonly class HybridStore implements ManagedStoreInterface, StoreInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chr-hertel Thanks for reviewing. I've just renamed it.
- Extract WHERE clause logic into addFilterToWhereClause() helper method - Fix embedding param logic: ensure it's set before maxScore uses it - Replace fragile str_replace() with robust str_starts_with() approach - Remove code duplication between buildFtsOnlyQuery and buildHybridQuery This addresses review feedback about fragile WHERE clause manipulation and centralizes the logic in a single, reusable method.
- Rename class from PostgresHybridStore to HybridStore - The namespace already indicates it's Postgres-specific - Add postgres-hybrid.php RAG example demonstrating: * Different semantic ratios (0.0, 0.5, 1.0) * RRF (Reciprocal Rank Fusion) hybrid search * Full-text search with 'q' parameter * Per-query semanticRatio override
Side-by-side comparison of FTS, Hybrid (RRF), and Semantic search. Uses Supabase (pgvector + PostgreSQL FTS). 30 sample articles with interactive Live Component. Related: symfony/ai#783 Author: Ahmed EBEN HASSINE <[email protected]>
|
@ahmed-bhs could you please have a look at the pipeline failures - i think there's still some minor parts open |
Problem
Choosing between vector search and full-text search forces a trade-off:
Users often need both in the same query: conceptual understanding + lexical precision.
Solution
Hybrid search combining both using Reciprocal Rank Fusion (RRF), following Supabase's approach.
RRF merges rankings from vector similarity and PostgreSQL Full-Text Search (ts_rank_cd).
Features
Implementation
Example