This repository was archived by the owner on Oct 10, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 327
Feature: native support for synonyms in FTS #6000
Copy link
Copy link
Open
Description
FTS is already great since advanced features like stopwords and stemming for several languages are available.
What is missing, is the possibility to natively deal with synonyms.
Of course using vector search works in this direction, but its not the same.
Sometimes e.g. for pre-filtering the graph, one prefer relying on exact matches.
typical usecases:
-
abbreviations:
CFRP <-> Carbon Fiber Reinforced Plastic -
common foreign language usage in special domains
Lion <-> Panthera leo -
name variants:
Robert <-> Bob
If one would like to directly start with design decisions being prepared for further advancements, there are some things to consider:
- synonyms may consists of several words (see given examples)
- may work uni- and bi-directional (syntax driven)
- do one need a transparent/hinted separation of exact matches (original word) and synonyms?
- one should care for interacting with wild card pattern matching/advanced_pattern_match Improve fts wild card pattern matching #5998
- one should care for interacting with stemming
- there may be more than one synonym for a word/phrase
- each synonym may have a weight (to be able to boost the ones that are closer to the original concept/more important for current domain)
- one may have a look at lucene's functionality and syntax
leopard, big cat|0.8, bagheera|0.9, panthera pardus|0.85
lion => panthera leo|0.9, simba leo|0.8, kimba|0.75
Metadata
Metadata
Assignees
Labels
No labels