Skip to content

Conversation

@fang-xing-esql
Copy link
Member

@fang-xing-esql fang-xing-esql commented Nov 8, 2025

This is to support #136035.

This PR addresses the following items:

  • Enabled CCS tests with subqueries in the following tests:

    • Added CrossClusterSubqueryIT
    • Added CrossClusterSubqueryUnavailableRemotesIT
    • Updated MultiClusterSpecIT, convert queries with subqueries to use remote index patterns, by recognizing multiple from commands in the query
    • Updated subquery.csv-spec to allow more queries to run with MultiClusterSpecIT, removed the fork tags, and removed metadata from some of the queries.
  • If skipUnavailable=true is set on remote clusters, prune subqueries that do not have a valid index pattern found for them, so that the query can continue as far as there is valid IndexResolution for a subset of the subqueries.

    • Updated EsqlSession to recognize subquery index patterns that do not have a valid IndexResolution found, and mark them as EMPTY_SUBQUERY
    • A new rule PruneEmptyUnionAllBranch is added in Analyzer to prune empty subqueries.

Item that will be addressed in the next PRs:

  • We still have some restrictions of subquery(with remote index pattern) with lookup join, it is marked as TODO in the CrossClusterSubqueryIT, and it will be addressed in a separate PR.

@fang-xing-esql fang-xing-esql added the test-release Trigger CI checks against release build label Nov 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql fang-xing-esql marked this pull request as ready for review November 12, 2025 00:37
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@smalyshev smalyshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed part of it, will finish up tomorrow.

for (int i = 0; i <= input.length() - delimiterLength; i++) {
char c = input.charAt(i);

if (c == '(') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we don't have (s inside strings anywhere, otherwise this I think would break.

/**
* Convert index patterns and subqueries in FROM commands to use remote indices.
*/
private static String convertSubqueryToRemoteIndices(String testQuery) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about SET commands that can precede FROM? Are those supported?

for (String indexPatternOrSubquery : indexPatternsAndSubqueries) {
// remove the from keyword if it's there
indexPatternOrSubquery = indexPatternOrSubquery.strip();
if (indexPatternOrSubquery.toLowerCase(Locale.ROOT).startsWith("from ")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply every element of indexPatternsAndSubqueries can start with FROM, but there could be only one FROM and then comma-separated list of expressions... Not sure if it's important for particular queries in the tests, but seems incorrect.

// substitute the index patterns or subquery with remote index patterns
if (isSubquery(indexPatternOrSubquery)) {
// it's a subquery, we need to process it recursively
String subquery = indexPatternOrSubquery.substring(1, indexPatternOrSubquery.length() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSubquery does strip() but this assumes there's no whitespace before or after in indexPatternOrSubquery?

transformed.add("(" + transformedSubquery + ")");
} else {
// It's an index pattern, we need to convert it to remote index pattern.
// indexPatternOrSubquery could be a comma separated list of indices, we need to process each index separately
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I am not sure I understand this. How indexPatternOrSubquery could be a list of indices? We already split on , once, and indexPatternOrSubquery should be either single index pattern or subquery, not?

}
}

private static class PruneEmptyUnionAllBranch extends ParameterizedAnalyzerRule<UnionAll, AnalyzerContext> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some javadoc explaining what this does.

}
// Check if a subquery need to be pruned. If some but not all the subqueries has invalid index resolution,
// and all the clusters referenced by the subquery that has invalid index resolution have skipUnavailable=true,
// try to prune it by setting IndexResolution to EMPTY_SUBQUERY.,Analyzer.PruneEmptyUnionAllBranch will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .,

// take care of removing the subquery during analysis.
// If all subqueries have invalid index resolution, we should fail in Analyzer's verifier.
if (r.indexResolution.isEmpty() == false // it is not a row
&& r.indexResolution.size() > 1 // there is a subquery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can subquery be the only clause in FROM? I understand it's kinda weird to do it, but is it possible?

// and skipUnavailable is true for all the clusters involved
for (var entry : r.indexResolution.entrySet()) {
IndexPattern indexPattern = entry.getKey();
IndexResolution indexResolution = entry.getValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

r.indexResolution.forEach((indexPattern, indexResolution) -> {
...

would be cleaner?

String clusterAlias = entry.getKey();
String indexExpression = entry.getValue();
EsqlExecutionInfo.Cluster cluster = executionInfo.getCluster(clusterAlias);
if (indexExpression.equals(cluster.getIndexExpression())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this check - if it's the entry under this cluster's alias, why we need to check again?

required_capability: subquery_in_from_command

FROM employees, (FROM sample_data | EVAL x = client_ip::keyword ) metadata _index
FROM employees, (FROM sample_data metadata _index | EVAL x = client_ip::keyword ) metadata _index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this check:

        assumeFalse("can't test with _index metadata", (remoteMetadata == false) && hasIndexMetadata(testCase.query));

wouldn't it interfere with this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. I'd like to keep some metadata tests for subqueries here. So I removed metadata option from a few queries in this file and hope to cover more subquery tests in MultiClusterSpecIT.

// CSV spec for subqueries
//

subqueryInFrom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for my education - if we have two sets of metadata fields, like this: FROM index1, (FROM index2 METADATA _index, _id) METADATA _index, _version - what is the resulting field set here? Is it a union, an intersection or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an UnionAll of the results of all subqueries, if there are duplicated records among the subqueries, the duplicates are not removed.

}

public void testSubquery() throws IOException {
setupClusters(3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see every test here is doing the same setup. Maybe move it to a @Before method?

assertNull(row.get(constIndex));
String indexName = (String) row.get(indexIndex);
if (i < 10) {
assertEquals("cluster-a:logs-2", indexName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to use the constants here? It's not entirely clear where these names come from, as they aren't part of this class. We could also make compound constants if necessary to make it more convenient.

try (EsqlQueryResponse resp = runQuery("""
FROM
logs-*,
(FROM *:logs-* metadata _index | where v < 4 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test the case when fulltext WHERE is inside the subquery? Or that doesn't work yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full text function inside subquery is supported, CsvTests have more varieties of tests for supported subqueries, including full text functions inside and outside of subqueries, I'll add one here as well.

}

// lookup join in main query after subqueries is not supported yet, because there is remote index pattern and limit
// TODO remove the limit added for subqueries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting corner case. I think it can be resolved with local limits thing we now have. Do we have an issue for this TODO? If not, we probably should add one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two TODOs in this test suite are tracked here

}
}

protected void setupAlias(String clusterAlias, String indexName, String aliasName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have setupAlias in CrossClusterLookupJoinIT. Maybe move it to AbstractCrossClusterTestCase for reuse?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-release Trigger CI checks against release build v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants