`iox_query` uses a cached parquet reader by default #25721

hiltontj · 2024-12-28T00:45:21Z

Problem statement

Note: It is not clear if this is actually a problem yet, but I felt this worth looking into after noticing this while working on #25714.

Due to this configuration, iox_query uses a cached parquet reader by default. That configuration is used here, leading to the creation of a CachedParquetFileReaderFactory here.

The concern is that this could be causing memory overhead alongside the parquet cache implemented for the monolith (which is defined here). The CachedParquetFileReaderFactory holds a copy of the parquet bytes here, which, for data coming from the actual object store would be a good thing, for two reasons (see here):

to avoid excessive read request for parquet files
to decouple tokio IO/main-runtime from CPU-bound DataFusion runtime

but for data that is in the monolith's parquet cache would be redundant.

Proposed solution

This needs to be investigated, but if it is a problem, we may need to:

disable the use of the CachedParquetFileReaderFactory by default
implement a ParquetFileReaderFactory that does not have this redundancy with the monolith's parquet cache (if possible within the bounds of using iox_query).

Alternatives considerd

N/A

Additional context

N/A

The text was updated successfully, but these errors were encountered:

hiltontj added the v3 label Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`iox_query` uses a cached parquet reader by default #25721

`iox_query` uses a cached parquet reader by default #25721

hiltontj commented Dec 28, 2024

iox_query uses a cached parquet reader by default #25721

iox_query uses a cached parquet reader by default #25721

Comments

hiltontj commented Dec 28, 2024

Problem statement

Proposed solution

Alternatives considerd

Additional context

`iox_query` uses a cached parquet reader by default #25721

`iox_query` uses a cached parquet reader by default #25721