Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iox_query uses a cached parquet reader by default #25721

Open
hiltontj opened this issue Dec 28, 2024 · 0 comments
Open

iox_query uses a cached parquet reader by default #25721

hiltontj opened this issue Dec 28, 2024 · 0 comments
Labels

Comments

@hiltontj
Copy link
Contributor

Problem statement

Note: It is not clear if this is actually a problem yet, but I felt this worth looking into after noticing this while working on #25714.

Due to this configuration, iox_query uses a cached parquet reader by default. That configuration is used here, leading to the creation of a CachedParquetFileReaderFactory here.

The concern is that this could be causing memory overhead alongside the parquet cache implemented for the monolith (which is defined here). The CachedParquetFileReaderFactory holds a copy of the parquet bytes here, which, for data coming from the actual object store would be a good thing, for two reasons (see here):

  • to avoid excessive read request for parquet files
  • to decouple tokio IO/main-runtime from CPU-bound DataFusion runtime

but for data that is in the monolith's parquet cache would be redundant.

Proposed solution

This needs to be investigated, but if it is a problem, we may need to:

  • disable the use of the CachedParquetFileReaderFactory by default
  • implement a ParquetFileReaderFactory that does not have this redundancy with the monolith's parquet cache (if possible within the bounds of using iox_query).

Alternatives considerd

N/A

Additional context

N/A

@hiltontj hiltontj added the v3 label Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant