Skip to content

ES|QL: FORK memory management #130072

Open
Open
@luigidellaquila

Description

@luigidellaquila

I got a failure from Generative tests that is a bit suspicious:

On CSV dataset (load with ./gradlew :x-pack:plugin:esql:qa:testFixtures:loadCsvSpecData --args="http://elastic-admin:elastic-password@localhost:9200"):

from airp*
| rename scalerank as language_code 
| lookup join languages_lookup on language_code 
| stats  `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev 
| mv_expand name 
| mv_expand `location` 
| FORK ( WHERE true ) ( WHERE true ) ( WHERE true ) ( WHERE true ) ( WHERE true ) 
| WHERE _fork == "fork2" 
| DROP _fork 
| limit 8169
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [322134656/307.2mb], which is larger than the limit of [322122547/307.1mb];

What is strange is that the query, without FORK, returns only 889 records and 5 columns with very small values

from airp*
| rename scalerank as language_code 
| lookup join languages_lookup on language_code 
| stats  `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev 
| mv_expand name 
| mv_expand `location` 
| STATS count(*)
   count(*)    
---------------
889   
from airp*
| rename scalerank as language_code 
| lookup join languages_lookup on language_code 
| stats  `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev 
| mv_expand name 
| mv_expand `location` 
     name      |  dJAukFBDtW   |   location    |   EtWxTktW    |    abbrev     
---------------+---------------+---------------+---------------+---------------
1              |8              |8              |8              |null           
1              |9              |9              |9              |AWZ            
1              |9              |9              |9              |GWL            
1              |9              |9              |9              |HOD            
1              |9              |9              |9              |IXR            
1              |9              |9              |9              |LUH           
...

The failure is not completely deterministic, but it's very frequent, and it happens also with a smaller query, with only three FORK branches

from airp*
| rename scalerank as language_code 
| lookup join languages_lookup on language_code 
| stats  `name` = count_distinct(language_code), dJAukFBDtW = max(language_code), `location` = min(language_code), EtWxTktW = min(language_code) by abbrev 
| FORK ( WHERE true ) ( WHERE true ) ( WHERE true )

300MB seems too much for such small query, the source indices contain only 5189 records and 11 columns.

I'm labeling it as a bug for now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions