Search Module - Future Enhancements

BM25 hybrid search with Reciprocal Rank Fusion (RRF) blending lexical and semantic results; configurable k1/b/RRF-k constants
HNSW vector indexing supporting Cosine, Dot Product, and L2 distance metrics with 85%+ recall@10
LLM-based query rewriting to improve recall for underspecified or noisy queries
Faceted search with per-field value-count facets and numeric range buckets with drill-down filtering
Learning-to-rank (LTR) re-ranker using a 6-dimensional feature vector with online pairwise gradient-descent training
Multi-field boosting (title > body > tags) with configurable per-field weight vectors
Neural sparse retrieval (SPLADE) for vocabulary expansion and out-of-vocabulary term handling
Cross-lingual semantic search via multilingual embedding models with automatic language detection

Design Constraints

Hybrid search (BM25 + HNSW RRF) must return top-10 results in ≤ 20 ms at p99 for indices up to 10 M documents
BM25 default parameters (k1=1.2, b=0.75) must remain unchanged; parameter changes must be explicit
HybridSearch API (RRF weights, mode selection) must remain stable from v1.2.0 onwards; no breaking changes
LLM query rewriting must degrade gracefully to the original query when the LLM is unavailable or exceeds 200 ms
LTR model updates must be applied online (no index rebuild) and must not affect in-flight queries
SPLADE sparse vectors must be stored in a compressed sparse row (CSR) format to cap memory at ≤ 4 GB per 10 M docs
Cross-lingual search must support at least 50 languages via a single multilingual model with no per-language index
Facet counting must complete in ≤ 5 ms for up to 1,000 distinct facet values on a result set of 100,000 documents

Required Interfaces

Interface	Consumer	Notes
`HybridSearch::search(query, top_k, mode)`	HTTP API, SDK	Modes: TEXT / VECTOR / HYBRID; returns `SearchResult` list
`LlmQueryRewriter::rewrite(query)`	HybridSearch pre-processor	Returns rewritten query string; 200 ms timeout with original fallback
`FacetedSearch::facet(results, fields)`	API response builder	Returns `FacetMap`; supports value-count and numeric range buckets
`LearningToRank::rerank(query, results)`	Result post-processor	Online pairwise training; model update via `train(feedback)`
`MultiFieldBoostedSearch::search(query, weights)`	API handler	Per-field boost weights map; falls back to equal weights if empty
`NeuralSparseRetriever::retrieve(query, top_k)`	Hybrid retrieval pipeline	SPLADE sparse vectors; CSR storage format
`CrossLingualSearch::search(query, lang_hint)`	Multilingual API	Auto-detects language; uses multilingual embedding model
`SearchAnalytics::record(query, latency_ms)`	Request middleware	Thread-safe; exports avg/p95/p99 latency and zero-result rate

Delivered in v1.4.0

The following production-readiness improvements were delivered in v1.4.0:

Configurable vector metric (COSINE / DOT / L2) — was hardcoded to COSINE
Strict config validation with std::invalid_argument on bad parameters
Hard resource limits (max_k, max_candidates) to bound memory / latency
Score normalization edge-case fixes (range == 0 / single-result)
Linear-combination pre-normalization (scores always in [0,1] before weighting)
SearchStats struct for graceful degradation and partial-result detection
Thread-safety and exception-safety documentation
normalizeScores promoted to public static for testability
Comprehensive test suite: test_hybrid_search.cpp, test_rrf_fusion.cpp, test_score_normalization.cpp, test_hybrid_search_integration.cpp
benchmark_hybrid_search.cpp for algorithmic performance measurement

Delivered in v1.5.0

The following high-priority features were delivered in v1.5.0:

QueryExpander (`include/search/query_expander.h`)

Synonym expansion with configurable maximum expansion count
Per-token spelling correction using Levenshtein edit distance against a user-supplied vocabulary
Alternative query generation by substituting tokens with their synonyms
Zero-result fallback via relaxQuery() (drops the last token)
Tests: tests/test_query_expander.cpp

FuzzyMatcher (`include/search/fuzzy_matcher.h`)

Levenshtein edit-distance search wrapping SecondaryIndexManager::scanFulltextFuzzy
Public static utilities: levenshtein(), soundex(), metaphone(), ngramSimilarity()
Unified [0,1] similarity score across all algorithms
Bigram/N-gram overlap (Dice coefficient)
Tests: tests/test_fuzzy_matcher.cpp

FacetedSearch (`include/search/faceted_search.h`)

computeFacet() — value-count facets for a single column
computeFacets() — multi-column facets in one call
computeRangeFacet() — numeric range buckets
applyFacetFilters() — intersect candidate PK sets with active field=value constraints
Tests: tests/test_faceted_search.cpp

SearchAnalytics (`include/search/search_analytics.h`)

Thread-safe query event logging with configurable max-events cap (circular eviction)
computeMetrics() — average, p95, p99 latency; zero-result rate; top-20 queries
getZeroResultQueries() — retrieve most-recent zero-result events for alerting
getRecentEvents() — retrieve recent event log
Tests: tests/test_search_analytics.cpp

AutocompleteEngine (`include/search/autocomplete.h`)

suggestByPrefix() — secondary-index prefix scan returning matching field values
suggestPopular() — popular past queries starting with the prefix (backed by SearchAnalytics)
suggest() — combined, deduplicated, score-ranked suggestions from both sources
Tests: tests/test_autocomplete.cpp

LearningToRank (`include/search/learning_to_rank.h`)

rerank() — dot-product linear re-ranker over 6-dimensional RankingFeatures
recordClick() / train() — pairwise gradient-descent training from click-through events
registerVariant() / selectVariant() — deterministic A/B traffic splitting
rerankWithVariant() — apply a named scorer; falls back to default if variant not found
Tests: tests/test_learning_to_rank.cpp

MultiModalSearch (`include/search/multi_modal_search.h`)

Accepts ModalQuery components (TEXT, IMAGE, AUDIO, CUSTOM) each with a weight
TEXT modality: delegates to SecondaryIndexManager::scanFulltextWithScores
Embedding modalities: delegates to VectorIndexManager::searchKnn
All result lists fused via weighted RRF; top-k returned
searchTextAndImage() convenience method for the most common bi-modal case
Tests: tests/test_multi_modal_search.cpp

Delivered in v1.6.0

LlmQueryRewriter (`include/search/llm_query_rewriter.h`)

LLM-based query rewriting for improved recall via injected LlmBackend callback
Builds structured prompts instructing the LLM to produce n numbered alternative phrasings
Parses numbered lines (supporting ., ), and : separators) from LLM response
Deduplication: case-insensitive comparison removes duplicate and original-equivalent rewrites
Length guard: rewrites exceeding max_rewrite_length are silently dropped
Graceful fallback: returns original query when backend is absent, throws, or produces no output
setBackend(): runtime backend replacement (e.g. after model load)
Tests: tests/test_llm_query_rewriter.cpp

Delivered in v1.7.0

Ranked Spelling Correction Suggestions (`include/search/query_expander.h`)

SpellingCorrection struct: suggestion, edit_distance, confidence (normalized [0,1])
QueryExpander::suggestSpellingCorrections(word, max_suggestions) — returns up to max_suggestions ranked candidates from the registered vocabulary within max_edit_distance, sorted by ascending edit distance (alphabetical tiebreak); confidence decays linearly with distance
QueryExpander::suggestQueryCorrections(query, max_suggestions) — tokenizes the query, builds per-token substitution variants and one all-corrected variant, deduplicates, and sorts by total edit distance ascending
Tests: tests/test_query_expander.cpp (14 new test cases)

Delivered in v1.8.0

LlmReranker (`include/search/llm_reranker.h`)

Configurable re-ranking with LLM feedback loop, closing Phase 3 of the Search Roadmap
Injected LlmBackend callable (same pattern as LlmQueryRewriter) — no hard LLM library dependency
LlmRerankCandidate and LlmRerankResult value types; rerank() accepts any result source
Batched prompting: splits candidates into configurable batch_size groups for cost control
Score parsing: per-document 0–10 integer scores from the LLM response, normalised to [0, 1]
Out-of-range score clamping (values outside [0, 10] clamped before normalisation)
Configurable blending: final = llm_weight * llm_score + (1 - llm_weight) * initial_score
min_score_threshold filter: removes low-confidence results from the output
Graceful fallback: returns original order when backend is absent, throws, or produces no output
setBackend(): runtime backend replacement (e.g. after model load)
toClickEvents() static method: converts LLM relevance judgments to ClickEvent objects for direct consumption by LearningToRank::recordClick(), closing the feedback loop
Tests: tests/test_llm_reranker.cpp

HybridSearch LlmReranker integration (`include/search/hybrid_search.h`)

HybridSearch::setReranker() attaches an LlmReranker to the search pipeline
Applied as a post-fusion step after RRF (or linear combination): top-N results are re-scored by the LLM and returned in LLM-determined order with updated hybrid_score
Null backend disables re-ranking; search() returns RRF order unchanged
HybridSearch::Result::content field forwarded to LlmRerankCandidate::content
Tests: tests/test_hybrid_search.cpp (HybridSearchReranker test group)

Delivered in v1.9.0

MultiFieldBoostedSearch (`include/search/multi_field_search.h`)

Multi-field boosted full-text search with per-field BM25 scoring and configurable boost weights
Implements the "title > body > tags" priority: title boost=3.0, body boost=1.0, tags boost=0.5
FieldConfig value type: {table, column, boost} — fully configurable, any field combination
search(query, fields) — queries each field independently, normalises BM25 scores to [0, 1] per field, multiplies by the field boost, and accumulates into a combined per-document score
Score combination formula: score(doc) = Σ_f( boost_f × normalised_bm25_score_f(doc) )
Documents appearing in only a subset of fields are still ranked (missing fields contribute 0)
defaultFields(table) — convenience factory: title/body/tags with the canonical boost hierarchy
Result struct: document_id, combined score, and per-field field_scores breakdown
Graceful handling: negative-boost fields are skipped with a warning; empty query/fields return immediately; null index returns empty without throwing; all exceptions caught inside search()
Exception safety: search() never throws; errors logged via THEMIS_WARN/THEMIS_ERROR
Tests: tests/test_multi_field_search.cpp

Delivered in v2.0.0

PersonalizedRanker (`include/search/personalized_ranker.h`)

Per-user interaction history tracking for personalized search result re-ranking
InteractionType enum: VIEW (0.2), CLICK (0.5), BOOKMARK (1.0), LIKE (1.0), DISLIKE (-0.5)
UserInteraction struct: user_id, document_id, type, timestamp
Time-decayed scoring: score += type_weight * exp(-decay_rate * age_days) — more recent interactions contribute more
Configurable decay rate (default 0.05 ≈ half-weight after ~14 days), boost weight, and per-user buffer size
Per-user history bounded by Config::max_interactions_per_user with oldest-first eviction
computeScore(user_id, doc_id) — returns personalization score in [-1, 1] for any (user, document) pair
applyPersonalization(user_id, candidates) — adjusts final_score of ranked candidates and re-sorts
getUserInteractions(user_id) — inspect stored interactions (most-recent first)
clearUser(user_id) / clear() — GDPR-compatible user-data removal
Thread-safe: all methods protected by a shared std::mutex
Tests: tests/test_personalized_ranker.cpp (28 test cases)

CrossLingualSearch (`include/search/cross_lingual_search.h`)

Cross-lingual semantic search via multilingual embeddings (Phase 4 of the Search Roadmap)
Model-agnostic design: callers supply pre-computed float vectors (e.g. from paraphrase-multilingual-mpnet-base-v2 or LaBSE); no hard embedding-library dependency
search(embedding, hints) — single-embedding kNN query with distance-to-similarity conversion (1 / (1 + distance)), optional per-language boost, score threshold filter, k-cap
searchMultiEmbedding(queries, hints) — issues independent kNN queries for each EmbeddingQuery (embedding + weight), fuses ranked lists via weighted Reciprocal Rank Fusion (RRF), then applies language boosts and threshold filtering
setLanguageMap(map) — supplies a doc_id → language_code (ISO 639-1) mapping for result annotation (Result::language) and LanguageHint boost application
LanguageHint struct: {language_code, boost} — multiplies the final score of results in the specified language; zero or negative boost hints are silently ignored
EmbeddingQuery struct: {embedding, weight} — weight drives the RRF contribution term
Resource safety: k and candidates are clamped to max_k / max_candidates at construction time; search() and searchMultiEmbedding() never throw
Config validation: throws std::invalid_argument on k=0, candidates=0, rrf_k≤0
Tests: tests/test_cross_lingual_search.cpp

NeuralSparseRetrieval (`include/search/neural_sparse_retrieval.h`)

SPLADE / BERT-based neural sparse retrieval engine, closing Phase 4 item of the Search Roadmap
SparseVector type alias: unordered_map<string, float> — non-zero learned term weights
SparseEncoderBackend callable type — same injected-backend pattern as LlmReranker::LlmBackend
addDocument(doc_id, sparse_vec) — indexes a pre-computed sparse vector; negative weights clamped
addDocumentText(doc_id, text) — encodes text via attached backend then indexes it
removeDocument(doc_id) — clean removal from both forward and inverted indexes
search(query_vec, k) — inverted-index dot-product accumulation; O(|q| × postings)
searchText(query_text, k) — encodes query then calls search(); never throws
Scoring: score(q, d) = Σ_t( q[t] * d[t] ) — standard inner product over shared terms
Config::max_terms_per_doc — soft cap; top-weighted terms kept, remainder discarded
Config::score_threshold — pre-normalization minimum score filter
Config::normalize_scores — optional linear rescaling to [0, 1] (same edge-case handling as HybridSearch::normalizeScores)
normalizeScores() promoted to public static for direct unit testing
Result struct: document_id, score (normalized), raw_score (inner product before normalization)
Graceful fallback: no encoder → no-op for text methods; encoder throws → empty result, no propagation
Tests: tests/test_neural_sparse_retrieval.cpp

Query Expansion and Rewriting

Priority: High
Status: ✅ Delivered in v1.5.0 — see include/search/query_expander.h

Automatically expand and rewrite queries for better results.

Features:

Synonym expansion using thesaurus
Query relaxation for zero-result queries
Spelling correction and suggestion
Query intent detection
Automatic phrase detection

Implementation:

class QueryExpander {
public:
    struct ExpansionConfig {
        bool use_synonyms = true;
        bool correct_spelling = true;
        bool detect_phrases = true;
        double synonym_weight = 0.8;
        size_t max_expansions = 5;
    };
    
    Result<ExpandedQuery> expand(
        const std::string& query,
        const ExpansionConfig& config
    );
    
    Result<std::string> correctSpelling(
        const std::string& query
    );
    
    Result<std::vector<std::string>> suggestAlternatives(
        const std::string& query
    );
};

Advanced Fuzzy Matching

Priority: Medium
Status: ✅ Delivered in v1.5.0 — see include/search/fuzzy_matcher.h

Enhanced fuzzy search with phonetic algorithms.

Features:

Levenshtein distance for edit distance
Soundex and Metaphone for phonetic matching
N-gram matching
Configurable fuzziness levels
Performance optimization with BK-trees

Expected Performance:

Fuzzy search overhead: <50% vs exact search
Support 1-2 edit distance efficiently

Multi-Modal Search

Priority: Medium
Status: ✅ Delivered in v1.5.0 — see include/search/multi_modal_search.h

Search across text, images, and other modalities.

Features:

Image search using CLIP embeddings
Audio search for voice queries
Cross-modal retrieval (text→image, image→text)
Unified embedding space

Learning to Rank (LTR)

Priority: Medium
Status: ✅ Delivered in v1.5.0 — see include/search/learning_to_rank.h

Machine learning-based result ranking.

Features:

Training from click-through data
Multiple ranking features
Online learning and adaptation
A/B testing framework

Search Analytics

Priority: High
Status: ✅ Delivered in v1.5.0 — see include/search/search_analytics.h

Track and analyze search performance.

Features:

Query log analysis
Click-through rate tracking
Zero-result query detection
Search performance metrics
User behavior analysis

Faceted Search

Priority: High
Status: ✅ Delivered in v1.5.0 — see include/search/faceted_search.h

Multi-dimensional filtering and navigation.

Features:

Dynamic facet generation
Range facets (price, date)
Hierarchical facets (categories)
Facet count computation
Drill-down navigation

Autocomplete and Suggestions

Priority: Medium
Status: ✅ Delivered in v1.5.0 — see include/search/autocomplete.h

Real-time query suggestions.

Features:

Prefix-based suggestions
Popular query suggestions
Personalized suggestions
Context-aware completion
Instant search results

Performance Roadmap

v1.5.0 Achieved

Query expansion overhead: <20% (in-memory synonym+spelling correction)
Fuzzy search: Within 2x of exact search (Levenshtein wrapped over BM25 index)
Faceted search: <50ms for 10 facets (secondary index range scans)
Autocomplete latency: <10ms (prefix range scan + analytics top-queries)
LTR inference: <1ms per query (linear dot product, 6 features)
Multi-modal search: sub-millisecond RRF fusion (embeddings pre-computed by caller)

v1.6.0 Targets

Personalized autocomplete (per-user click history)
Neural LTR (LambdaMART or small MLP) with offline batch training
Multi-namespace VectorIndexManager (one instance per modality)
Streaming result delivery for large k values

Test Strategy

Unit test coverage ≥ 80% for HybridSearch, BM25Ranker, LlmQueryRewriter, FacetedSearch, LearningToRank, NeuralSparseRetriever, and CrossLingualSearch
BM25 correctness tests: compute BM25 scores for 10 hand-crafted query-document pairs and assert results match the reference formula within ± 0.001
Hybrid recall@10 integration test: evaluate against a 1 M-document corpus; assert recall@10 ≥ 85% in HYBRID mode
LLM query rewriter fallback test: simulate LLM timeout (> 200 ms); assert original query is used and no exception is thrown
LTR online training test: feed 1,000 pairwise preference events; assert MRR@10 improves by ≥ 5% relative to the untrained model after 500 updates
Faceted search tests: assert value-count facets are correct for 1,000 distinct values and numeric range buckets respect configured boundaries
SPLADE neural sparse retrieval test: assert retrieved document IDs overlap ≥ 70% with BM25 top-20 on the BEIR TREC-COVID split
Cross-lingual search tests: submit queries in 5 languages (EN, DE, FR, ES, ZH); assert top-1 result is in the correct language-agnostic answer set

Performance Targets

Hybrid search (BM25 + HNSW RRF) latency: ≤ 20 ms at p99 for top-10 results on an index of up to 10 M documents
HNSW index build throughput: ≥ 10,000 vectors/s for 768-dimensional embeddings on a single CPU node
BM25 ranking throughput: ≥ 50,000 queries/s for simple term queries against a 1 M-document index on a 16-core host
LLM query rewriter overhead: ≤ 200 ms added latency at p99; 0 ms when LLM is unavailable (passthrough)
Facet counting latency: ≤ 5 ms for up to 1,000 distinct facet values on a result set of 100,000 documents
LTR re-ranking latency: ≤ 2 ms for re-ranking top-100 candidates with the 6-dimensional linear model
SPLADE index memory: ≤ 4 GB for a 10 M-document corpus stored in CSR format
Autocomplete suggestion latency: ≤ 5 ms at p99 for prefix queries against a 1 M-term dictionary

Security / Reliability

Query injection prevention: all user-supplied query strings must be length-limited (≤ 4,096 bytes) and stripped of control characters before being passed to the BM25 tokenizer or LLM rewriter
Resource exhaustion: max_k and max_candidates hard limits must be enforced before HNSW traversal begins; requests exceeding these limits return HTTP 400, not a partial result
LTR model integrity: the learned weight vector must be persisted with a checksum; a corrupted model file must be detected on load and the system must fall back to uniform weights
SPLADE vocabulary: the sparse vocabulary index must be loaded from a read-only memory-mapped file; writes to the vocabulary at runtime are rejected
Cross-lingual model: the multilingual embedding model binary must be verified against a SHA-256 hash at startup; a hash mismatch aborts loading and falls back to the BM25-only mode
SearchAnalytics query log must not store raw query strings containing PII; queries are hashed (SHA-256) before storage
Distributed shard search must authenticate inter-shard requests with mTLS; unauthenticated shard calls are rejected with a 401 error

Scientific References

Neural Sparse Retrieval (SPLADE)

[1] T. Formal, B. Piwowarski, J. Genthial, and S. Clinchant, "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking," in Proc. 44th Int. ACM SIGIR Conf. Res. Dev. Inf. Retrieval (SIGIR), pp. 2288–2292, 2021. https://doi.org/10.1145/3404835.3463098

[2] T. Formal, C. Lassance, B. Piwowarski, and S. Clinchant, "From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective," in Proc. 45th Int. ACM SIGIR Conf. Res. Dev. Inf. Retrieval (SIGIR), pp. 2353–2359, 2022. https://doi.org/10.1145/3477495.3531857

Cross-Lingual Semantic Search

[3] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, and W. Wang, "Language-Agnostic BERT Sentence Embedding," in Proc. 60th Annu. Meet. Assoc. Comput. Linguistics (ACL), pp. 878–891, 2022. https://doi.org/10.18653/v1/2022.acl-long.62

[4] J. Reimers and I. Gurevych, "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation," in Proc. 2020 Conf. Empirical Methods Nat. Lang. Process. (EMNLP), pp. 4512–4525, 2020. https://doi.org/10.18653/v1/2020.emnlp-main.365

Learning to Rank

[5] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, "Learning to Rank using Gradient Descent," in Proc. 22nd Int. Conf. Mach. Learn. (ICML), pp. 89–96, 2005. https://doi.org/10.1145/1102351.1102363

[6] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li, "Learning to Rank: From Pairwise Approach to Listwise Approach," in Proc. 24th Int. Conf. Mach. Learn. (ICML), pp. 129–136, 2007. https://doi.org/10.1145/1273496.1273513

Distributed Search

[7] N. Craswell, D. Hawking, and P. Thistlewaite, "Merging Results from Isolated Search Engines," in Proc. 10th Australasian Database Conf. (ADC), pp. 189–200, 1999.

[8] M. Shokouhi and L. Si, "Federated Search," Foundations and Trends in Information Retrieval, vol. 5, no. 1, pp. 1–102, 2011. https://doi.org/10.1561/1500000010

Query Expansion and Rewriting

[9] C. Carpineto and G. Romano, "A Survey of Automatic Query Expansion in Information Retrieval," ACM Comput. Surv., vol. 44, no. 1, pp. 1–50, 2012. https://doi.org/10.1145/2071389.2071390

[10] W. Yu et al., "Generate Rather Than Retrieve: Large Language Models Are Strong Context Augmenters," in Proc. 11th Int. Conf. Learn. Representations (ICLR), 2023. https://openreview.net/forum?id=ZuEW6ue886

Re-Ranking

[11] R. Nogueira and K. Cho, "Passage Re-ranking with BERT," arXiv preprint arXiv:1901.04085, 2019. https://arxiv.org/abs/1901.04085

[12] Y. Zhuang, B. Liu, B. Koopman, and G. Zuccon, "SPLADE-v3: New Baselines for SPLADE," arXiv preprint arXiv:2310.14276, 2023. https://arxiv.org/abs/2310.14276

FilesExpand file tree

FUTURE_ENHANCEMENTS.md

Latest commit

History

FUTURE_ENHANCEMENTS.md

File metadata and controls

Search Module - Future Enhancements

Design Constraints

Required Interfaces

Delivered in v1.4.0

Delivered in v1.5.0

QueryExpander (include/search/query_expander.h)

FuzzyMatcher (include/search/fuzzy_matcher.h)

FacetedSearch (include/search/faceted_search.h)

SearchAnalytics (include/search/search_analytics.h)

AutocompleteEngine (include/search/autocomplete.h)

LearningToRank (include/search/learning_to_rank.h)

MultiModalSearch (include/search/multi_modal_search.h)

Delivered in v1.6.0

LlmQueryRewriter (include/search/llm_query_rewriter.h)

Delivered in v1.7.0

Ranked Spelling Correction Suggestions (include/search/query_expander.h)

Delivered in v1.8.0

LlmReranker (include/search/llm_reranker.h)

HybridSearch LlmReranker integration (include/search/hybrid_search.h)

Delivered in v1.9.0

MultiFieldBoostedSearch (include/search/multi_field_search.h)

Delivered in v2.0.0

PersonalizedRanker (include/search/personalized_ranker.h)

CrossLingualSearch (include/search/cross_lingual_search.h)

NeuralSparseRetrieval (include/search/neural_sparse_retrieval.h)

Query Expansion and Rewriting

Advanced Fuzzy Matching

Multi-Modal Search

Learning to Rank (LTR)

Search Analytics

Faceted Search

Autocomplete and Suggestions

Performance Roadmap

v1.5.0 Achieved

v1.6.0 Targets

Test Strategy

Performance Targets

Security / Reliability

Scientific References

Neural Sparse Retrieval (SPLADE)

Cross-Lingual Semantic Search

Learning to Rank

Distributed Search

Query Expansion and Rewriting

Re-Ranking

See Also

QueryExpander (`include/search/query_expander.h`)

FuzzyMatcher (`include/search/fuzzy_matcher.h`)

FacetedSearch (`include/search/faceted_search.h`)

SearchAnalytics (`include/search/search_analytics.h`)

AutocompleteEngine (`include/search/autocomplete.h`)

LearningToRank (`include/search/learning_to_rank.h`)

MultiModalSearch (`include/search/multi_modal_search.h`)

LlmQueryRewriter (`include/search/llm_query_rewriter.h`)

Ranked Spelling Correction Suggestions (`include/search/query_expander.h`)

LlmReranker (`include/search/llm_reranker.h`)

HybridSearch LlmReranker integration (`include/search/hybrid_search.h`)

MultiFieldBoostedSearch (`include/search/multi_field_search.h`)

PersonalizedRanker (`include/search/personalized_ranker.h`)

CrossLingualSearch (`include/search/cross_lingual_search.h`)

NeuralSparseRetrieval (`include/search/neural_sparse_retrieval.h`)