Skip to content

Support like#34441

Open
gridnevvvit wants to merge 2 commits intoydb-platform:mainfrom
gridnevvvit:support-like
Open

Support like#34441
gridnevvvit wants to merge 2 commits intoydb-platform:mainfrom
gridnevvvit:support-like

Conversation

@gridnevvvit
Copy link
Member

Changelog entry

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

...

Copilot AI review requested due to automatic review settings February 18, 2026 16:12
@gridnevvvit gridnevvvit requested review from a team as code owners February 18, 2026 16:12
@github-actions
Copy link

🔴 Unable to merge your PR into the base branch. Please rebase or merge it with the base branch.

@ydbot
Copy link
Collaborator

ydbot commented Feb 18, 2026

Run Extra Tests

Run additional tests for this PR. You can customize:

  • Test Size: small, medium, large (default: all)
  • Test Targets: any directory path (default: ydb/)
  • Sanitizers: ASAN, MSAN, TSAN
  • Coredumps: enable for debugging (default: off)
  • Additional args: custom ya make arguments

▶  Run tests

@github-actions
Copy link

🟢 2026-02-18 16:15:22 UTC The validation of the Pull Request description is successful.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends KQP fulltext index access to support LIKE/StartsWith/EndsWith-style predicates (and a “Query” mode for FulltextMatch) by rewriting logical plans to push these predicates through fulltext index reads, and by updating the fulltext query representation to support composing query fragments.

Changes:

  • Change fulltext query representation from a single value to a list of values (proto + expr node), and plumb it through compilation/execution.
  • Update logical fulltext index rewrite to recognize FulltextMatch “Mode=Query” and LIKE/StartsWith/EndsWith predicates, generating appropriate index query terms and (when needed) a post-filter.
  • Update fulltext index unit tests to use the new “Mode=Query” path and add coverage for LIKE/StartsWith/EndsWith cases.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ydb/core/protos/kqp_physical.proto Make QueryValue repeated to carry query fragments.
ydb/core/kqp/expr_nodes/kqp_expr_nodes.json Change FullTextIndex Query node type to TExprList.
ydb/core/kqp/query_compiler/kqp_query_compiler.cpp Emit multiple QueryValue entries from the query list.
ydb/core/kqp/executer_actor/kqp_tasks_graph.cpp Concatenate repeated QueryValue parts into the runtime query string.
ydb/core/kqp/opt/logical/kqp_opt_log_indexes.cpp Rewrite detection/rewriting for fulltext predicates; add “Mode=Query” + LIKE-based postfilter construction; build query fragments.
ydb/core/kqp/opt/logical/kqp_opt_log.cpp Adjust optimizer handler registration order to accommodate the updated rewrite.
ydb/core/kqp/runtime/kqp_full_text_source.cpp Remove runtime Pire-based postfiltering and adjust ngram detection logic.
ydb/core/kqp/common/kqp_yql.h Rename BM25 setting keys to B/K1.
ydb/core/kqp/ut/indexes/fulltext/kqp_indexes_fulltext_ut.cpp Update FulltextMatch tests to use “Mode=Query”; add LIKE/StartsWith/EndsWith assertions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1471 to 1486
auto& query = Queries[0];
if (!query.NamedOptions) return settings;
for(auto& arg : query.NamedOptions->Children()) {
auto nameValueTuple = TExprBase(arg).Cast<TCoNameValueTuple>();
TExprBase value = TExprBase(nameValueTuple.Value().Cast().Ptr());
TString name = nameValueTuple.Name().StringValue();
name.to_lower();
if (name == "mode")
continue;

if (DefaultOperator) {
settings.push_back(Build<TCoNameValueTuple>(ctx, pos)
.Name<TCoAtom>()
.Value(TKqpReadTableFullTextIndexSettings::DefaultOperatorSettingName)
.Value(nameValueTuple.Name().StringValue())
.Build()
.Value(DefaultOperator)
.Value(value)
.Done());
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Settings() currently forwards all named options from the first fulltext predicate into TKqlReadTableFullTextIndex settings (except Mode). But TKqpReadTableFullTextIndexSettings::Parse() hard-fails (YQL_ENSURE(false)) on any unknown setting name, so passing through unrelated/typoed options will crash compilation. Please either (1) filter to the supported setting names only (B/K1/DefaultOperator/MinimumShouldMatch/ItemsLimit/SkipLimit) or (2) validate and reject unknown options here with a clear error, instead of emitting them into the index-read settings list.

Copilot uses AI. Check for mistakes.
Comment on lines 1741 to 1754
@@ -1797,7 +1750,7 @@ class TFullTextMatchSource : public TActorBootstrapped<TFullTextMatchSource>, pu
Words.emplace_back(MakeIntrusive<TWordReadState>(wordIndex++, query, IndexTableReader));
}

GeneratePostfilterMatchers(analyzer.analyzers(), expr);
IsNgram = true;
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsNgram is set to true for any column analyzer that uses ngram/edge-ngram, and then set to true unconditionally when the analyzer matches the searched column. This makes non-ngram fulltext indexes (or indexes where only other columns use ngrams) run the ngram-specific heuristic in StartWordReads() (dropping “imbalanced” terms), which can change matching semantics and/or performance unexpectedly. Please set IsNgram only based on the analyzers for the matched search column and only when use_filter_ngram/use_filter_edge_ngram is enabled for that column.

Copilot uses AI. Check for mistakes.
Comment on lines +133 to +134
static constexpr TStringBuf BFactorSettingName = "B";
static constexpr TStringBuf K1FactorSettingName = "K1";
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming the fulltext BM25 setting keys from BFactor/K1Factor to B/K1 is a breaking change for existing queries that might be passing the old names. If compatibility matters, consider accepting both spellings in TKqpReadTableFullTextIndexSettings::Parse() (treat old names as aliases) while keeping the new canonical names for output.

Copilot uses AI. Check for mistakes.
TExprNode::TPtr NamedOptions;
THashMap<std::string_view, TExprNode::TPtr> Settings;
bool StartsWithAny = false;
bool EndsWithWithAny = false;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/inconsistent naming: EndsWithWithAny reads like a duplication of “With”. Consider renaming to EndsWithAny (or HasSuffixWildcard) to make the intent clearer and avoid propagating the typo to future code.

Suggested change
bool EndsWithWithAny = false;
bool EndsWithAny = false;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments