-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ES|QL] Enable CCS tests for subqueries #137776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ES|QL] Enable CCS tests for subqueries #137776
Conversation
|
Hi @fang-xing-esql, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
smalyshev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed part of it, will finish up tomorrow.
| for (int i = 0; i <= input.length() - delimiterLength; i++) { | ||
| char c = input.charAt(i); | ||
|
|
||
| if (c == '(') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we don't have (s inside strings anywhere, otherwise this I think would break.
| /** | ||
| * Convert index patterns and subqueries in FROM commands to use remote indices. | ||
| */ | ||
| private static String convertSubqueryToRemoteIndices(String testQuery) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about SET commands that can precede FROM? Are those supported?
| for (String indexPatternOrSubquery : indexPatternsAndSubqueries) { | ||
| // remove the from keyword if it's there | ||
| indexPatternOrSubquery = indexPatternOrSubquery.strip(); | ||
| if (indexPatternOrSubquery.toLowerCase(Locale.ROOT).startsWith("from ")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to imply every element of indexPatternsAndSubqueries can start with FROM, but there could be only one FROM and then comma-separated list of expressions... Not sure if it's important for particular queries in the tests, but seems incorrect.
| // substitute the index patterns or subquery with remote index patterns | ||
| if (isSubquery(indexPatternOrSubquery)) { | ||
| // it's a subquery, we need to process it recursively | ||
| String subquery = indexPatternOrSubquery.substring(1, indexPatternOrSubquery.length() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSubquery does strip() but this assumes there's no whitespace before or after in indexPatternOrSubquery?
| transformed.add("(" + transformedSubquery + ")"); | ||
| } else { | ||
| // It's an index pattern, we need to convert it to remote index pattern. | ||
| // indexPatternOrSubquery could be a comma separated list of indices, we need to process each index separately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I am not sure I understand this. How indexPatternOrSubquery could be a list of indices? We already split on , once, and indexPatternOrSubquery should be either single index pattern or subquery, not?
| } | ||
| } | ||
|
|
||
| private static class PruneEmptyUnionAllBranch extends ParameterizedAnalyzerRule<UnionAll, AnalyzerContext> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add some javadoc explaining what this does.
| } | ||
| // Check if a subquery need to be pruned. If some but not all the subqueries has invalid index resolution, | ||
| // and all the clusters referenced by the subquery that has invalid index resolution have skipUnavailable=true, | ||
| // try to prune it by setting IndexResolution to EMPTY_SUBQUERY.,Analyzer.PruneEmptyUnionAllBranch will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: .,
| // take care of removing the subquery during analysis. | ||
| // If all subqueries have invalid index resolution, we should fail in Analyzer's verifier. | ||
| if (r.indexResolution.isEmpty() == false // it is not a row | ||
| && r.indexResolution.size() > 1 // there is a subquery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can subquery be the only clause in FROM? I understand it's kinda weird to do it, but is it possible?
| // and skipUnavailable is true for all the clusters involved | ||
| for (var entry : r.indexResolution.entrySet()) { | ||
| IndexPattern indexPattern = entry.getKey(); | ||
| IndexResolution indexResolution = entry.getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
r.indexResolution.forEach((indexPattern, indexResolution) -> {
...
would be cleaner?
| String clusterAlias = entry.getKey(); | ||
| String indexExpression = entry.getValue(); | ||
| EsqlExecutionInfo.Cluster cluster = executionInfo.getCluster(clusterAlias); | ||
| if (indexExpression.equals(cluster.getIndexExpression())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this check - if it's the entry under this cluster's alias, why we need to check again?
| required_capability: subquery_in_from_command | ||
|
|
||
| FROM employees, (FROM sample_data | EVAL x = client_ip::keyword ) metadata _index | ||
| FROM employees, (FROM sample_data metadata _index | EVAL x = client_ip::keyword ) metadata _index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this check:
assumeFalse("can't test with _index metadata", (remoteMetadata == false) && hasIndexMetadata(testCase.query));
wouldn't it interfere with this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. I'd like to keep some metadata tests for subqueries here. So I removed metadata option from a few queries in this file and hope to cover more subquery tests in MultiClusterSpecIT.
| // CSV spec for subqueries | ||
| // | ||
|
|
||
| subqueryInFrom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for my education - if we have two sets of metadata fields, like this: FROM index1, (FROM index2 METADATA _index, _id) METADATA _index, _version - what is the resulting field set here? Is it a union, an intersection or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an UnionAll of the results of all subqueries, if there are duplicated records among the subqueries, the duplicates are not removed.
| } | ||
|
|
||
| public void testSubquery() throws IOException { | ||
| setupClusters(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see every test here is doing the same setup. Maybe move it to a @Before method?
| assertNull(row.get(constIndex)); | ||
| String indexName = (String) row.get(indexIndex); | ||
| if (i < 10) { | ||
| assertEquals("cluster-a:logs-2", indexName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to use the constants here? It's not entirely clear where these names come from, as they aren't part of this class. We could also make compound constants if necessary to make it more convenient.
| try (EsqlQueryResponse resp = runQuery(""" | ||
| FROM | ||
| logs-*, | ||
| (FROM *:logs-* metadata _index | where v < 4 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also test the case when fulltext WHERE is inside the subquery? Or that doesn't work yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full text function inside subquery is supported, CsvTests have more varieties of tests for supported subqueries, including full text functions inside and outside of subqueries, I'll add one here as well.
| } | ||
|
|
||
| // lookup join in main query after subqueries is not supported yet, because there is remote index pattern and limit | ||
| // TODO remove the limit added for subqueries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting corner case. I think it can be resolved with local limits thing we now have. Do we have an issue for this TODO? If not, we probably should add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two TODOs in this test suite are tracked here
| } | ||
| } | ||
|
|
||
| protected void setupAlias(String clusterAlias, String indexName, String aliasName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have setupAlias in CrossClusterLookupJoinIT. Maybe move it to AbstractCrossClusterTestCase for reuse?
This is to support #136035.
This PR addresses the following items:
Enabled CCS tests with subqueries in the following tests:
CrossClusterSubqueryITCrossClusterSubqueryUnavailableRemotesITMultiClusterSpecIT, convert queries with subqueries to use remote index patterns, by recognizing multiplefromcommands in the querysubquery.csv-specto allow more queries to run withMultiClusterSpecIT, removed theforktags, and removed metadata from some of the queries.If
skipUnavailable=trueis set on remote clusters, prune subqueries that do not have a valid index pattern found for them, so that the query can continue as far as there is validIndexResolutionfor a subset of the subqueries.EsqlSessionto recognize subquery index patterns that do not have a validIndexResolutionfound, and mark them asEMPTY_SUBQUERYPruneEmptyUnionAllBranchis added inAnalyzerto prune empty subqueries.Item that will be addressed in the next PRs: