Open
Description
I got a failure from Generative tests that is a bit suspicious:
On CSV dataset (load with ./gradlew :x-pack:plugin:esql:qa:testFixtures:loadCsvSpecData --args="http://elastic-admin:elastic-password@localhost:9200"
):
from airp*
| rename scalerank as language_code
| lookup join languages_lookup on language_code
| stats `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev
| mv_expand name
| mv_expand `location`
| FORK ( WHERE true ) ( WHERE true ) ( WHERE true ) ( WHERE true ) ( WHERE true )
| WHERE _fork == "fork2"
| DROP _fork
| limit 8169
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [322134656/307.2mb], which is larger than the limit of [322122547/307.1mb];
What is strange is that the query, without FORK, returns only 889 records and 5 columns with very small values
from airp*
| rename scalerank as language_code
| lookup join languages_lookup on language_code
| stats `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev
| mv_expand name
| mv_expand `location`
| STATS count(*)
count(*)
---------------
889
from airp*
| rename scalerank as language_code
| lookup join languages_lookup on language_code
| stats `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev
| mv_expand name
| mv_expand `location`
name | dJAukFBDtW | location | EtWxTktW | abbrev
---------------+---------------+---------------+---------------+---------------
1 |8 |8 |8 |null
1 |9 |9 |9 |AWZ
1 |9 |9 |9 |GWL
1 |9 |9 |9 |HOD
1 |9 |9 |9 |IXR
1 |9 |9 |9 |LUH
...
The failure is not completely deterministic, but it's very frequent, and it happens also with a smaller query, with only three FORK branches
from airp*
| rename scalerank as language_code
| lookup join languages_lookup on language_code
| stats `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev
| FORK ( WHERE true ) ( WHERE true ) ( WHERE true )
300MB seems too much for such small query, the source indices contain only 5189 records and 11 columns.
I'm labeling it as a bug for now.