You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: It is not clear if this is actually a problem yet, but I felt this worth looking into after noticing this while working on #25714.
Due to this configuration, iox_query uses a cached parquet reader by default. That configuration is used here, leading to the creation of a CachedParquetFileReaderFactoryhere.
The concern is that this could be causing memory overhead alongside the parquet cache implemented for the monolith (which is defined here). The CachedParquetFileReaderFactory holds a copy of the parquet bytes here, which, for data coming from the actual object store would be a good thing, for two reasons (see here):
to avoid excessive read request for parquet files
to decouple tokio IO/main-runtime from CPU-bound DataFusion runtime
but for data that is in the monolith's parquet cache would be redundant.
Proposed solution
This needs to be investigated, but if it is a problem, we may need to:
disable the use of the CachedParquetFileReaderFactory by default
implement a ParquetFileReaderFactory that does not have this redundancy with the monolith's parquet cache (if possible within the bounds of using iox_query).
Alternatives considerd
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered:
Problem statement
Note: It is not clear if this is actually a problem yet, but I felt this worth looking into after noticing this while working on #25714.
Due to this configuration,
iox_query
uses a cached parquet reader by default. That configuration is used here, leading to the creation of aCachedParquetFileReaderFactory
here.The concern is that this could be causing memory overhead alongside the parquet cache implemented for the monolith (which is defined here). The
CachedParquetFileReaderFactory
holds a copy of the parquet bytes here, which, for data coming from the actual object store would be a good thing, for two reasons (see here):but for data that is in the monolith's parquet cache would be redundant.
Proposed solution
This needs to be investigated, but if it is a problem, we may need to:
CachedParquetFileReaderFactory
by defaultParquetFileReaderFactory
that does not have this redundancy with the monolith's parquet cache (if possible within the bounds of usingiox_query
).Alternatives considerd
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: