optimize parquet footer reader #24007

jinyangli34 · 2024-11-02T04:17:09Z

Description

Improve efficiency of reading parquet footers with:

Only loads row groups overlaps with offset/length. (if file_offset is set on parquet)
Only load referenced columns.

Additional context and related issues

We hit an issue when reading Iceberg table with very small row group size Parquet files. Even data size is not large (25GB), the query caused entire cluster busy for 1 hour. Most of the workers are busy processing the Parquet footers.

While the root cause is on Iceberg writer (more discussion in apache/iceberg#11258), we found there could be some optimization on Trino Parquet footer reader.

Each Parquet file is 200MB, with 1300 row groups, 80 columns.
Each time, Trino loads Parquet footer with all row groups and all columns, but only reads data from one row group and only reads 1 column.
Total operation is 1300 rounds * 1300 row groups * 80 columns = 135M
After optimization: Operation will be reduced to 1300 * 1 * 1 = 1300.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

findinpath · 2024-11-19T13:49:08Z

@jinyangli34 please take some time to make the description of the PR a bit more detailed.

Please put effort into describing as well the business case of this PR.
Where will this optimization come actually into play? Concrete use cases appreciated. ❤️

jinyangli34 · 2024-11-19T19:07:50Z

@jinyangli34 please take some time to make the description of the PR a bit more detailed.

Please put effort into describing as well the business case of this PR. Where will this optimization come actually into play? Concrete use cases appreciated. ❤️

Updated more details in context

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

raunaqmorarka · 2024-12-11T13:15:51Z

lib/trino-parquet/src/main/java/io/trino/parquet/reader/MetadataReader.java

@@ -119,144 +90,11 @@ public static ParquetMetadata readFooter(ParquetDataSource dataSource, Optional<
        InputStream metadataStream = buffer.slice(buffer.length() - completeFooterSize, metadataLength).getInput();

        FileMetaData fileMetaData = readFileMetaData(metadataStream);
-        ParquetMetadata parquetMetadata = createParquetMetadata(fileMetaData, dataSource.getId());
+        ParquetMetadata parquetMetadata = ParquetMetadata.createParquetMetadata(fileMetaData, dataSource.getId());


static import

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

raunaqmorarka · 2024-12-11T13:29:07Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

        MessageType messageType = readParquetSchema(schema);
        List<BlockMetadata> blocks = new ArrayList<>();
        List<RowGroup> rowGroups = fileMetaData.getRow_groups();
        if (rowGroups != null) {
            for (RowGroup rowGroup : rowGroups) {
                List<ColumnChunk> columns = rowGroup.getColumns();
-                validateParquet(!columns.isEmpty(), dataSourceId, "No columns in row group: %s", rowGroup);
+                checkState(!columns.isEmpty(), "No columns in row group: %s [%s]", rowGroup, dataSourceId);


Why is validateParquet changing to checkState ? We should retain existing behaviour of throwing ParquetCorruptionException

raunaqmorarka · 2024-12-11T13:31:42Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergParquetFileWriter.java

@@ -82,12 +82,12 @@ public FileMetrics getFileMetrics()
    {
        ParquetMetadata parquetMetadata;
        try {
-            parquetMetadata = ParquetMetadata.createParquetMetadata(parquetFileWriter.getFileMetadata(), new ParquetDataSourceId(location.toString()));
+            parquetMetadata = new ParquetMetadata(parquetFileWriter.getFileMetadata(), new ParquetDataSourceId(location.toString()));
+            return new FileMetrics(footerMetrics(parquetMetadata, Stream.empty(), metricsConfig), Optional.of(getSplitOffsets(parquetMetadata)));


I think this will parse row groups twice now. It may be better to cache the full list of parsed row groups once that is computed in ParquetMetadata

raunaqmorarka · 2024-12-11T13:32:14Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

+
+        return buildBlocks(paths);
+    }
+
    public List<BlockMetadata> getBlocks()


Rename to getBlocksWithoutColumnsMetadata

raunaqmorarka · 2024-12-11T13:39:09Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

@@ -104,6 +123,9 @@ public List<BlockMetadata> getBlocks()
                            .map(value -> value.toLowerCase(Locale.ENGLISH))
                            .toArray(String[]::new);
                    ColumnPath columnPath = ColumnPath.get(path);
+                    if (!paths.isEmpty() && !paths.contains(columnPath)) {


This potentially makes io.trino.parquet.metadata.BlockMetadata#getStartingPos incorrect.
If we want to do this, we need to record starting position separately and store that as an explicit field in BlockMetadata

seems BlockMetadata#getStartingPos is only used in write path without any column filtering. it should always contain the first column.
maybe it's better to add a fileOffset entry in RowGroupInfo for more consistent startPos in rowGroup?

Upto this commit, it is used in io.trino.parquet.predicate.PredicateUtils#getFilteredRowGroups as well.
Regardless of current usage, we cannot risk exposing an incorrect value here.

raunaqmorarka · 2024-12-11T13:55:31Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java

@@ -253,7 +253,7 @@ public static ReaderPageSource createPageSource(
                    start,
                    length,
                    dataSource,
-                    parquetMetadata.getBlocks(),
+                    parquetMetadata.getBlocks(descriptorsByPath.values()),


Rather than doing this, I think it would be better to send in blocks without column metadata here and provide a way for BlockMetadata to parse columns lazily given a list of required paths.
We should maintain io.trino.parquet.metadata.PrunedBlockMetadata#createPrunedColumnsMetadata as the place where pruning of column metadata happens instead of it happening in two different places.

Thanks for the pointer. I added a new commit to consolidate the common logics in PrunedBlockMetadata#createPrunedColumnsMetadata and the ParquetMetadata#getBlocks.

raunaqmorarka · 2024-12-11T13:57:53Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

@@ -104,6 +123,9 @@ public List<BlockMetadata> getBlocks()
                            .map(value -> value.toLowerCase(Locale.ENGLISH))
                            .toArray(String[]::new);
                    ColumnPath columnPath = ColumnPath.get(path);
+                    if (!paths.isEmpty() && !paths.contains(columnPath)) {
+                        continue;


I'm wondering why this commit helps. We already have io.trino.parquet.metadata.PrunedBlockMetadata#createPrunedColumnsMetadata and this logic is only preventing ColumnChunkMetadata.get. Are you sure this is expensive enough to warrant special handling ?

raunaqmorarka · 2024-12-11T13:59:45Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java


-    public ParquetMetadata(FileMetaData fileMetaData, ParquetDataSourceId dataSourceId)
+    public ParquetMetadata(FileMetaData fileMetaData, ParquetDataSourceId dataSourceId, Optional<Long> offset, Optional<Long> length)


Create a record for offset and length and make the argument Optional<OffsetLength>

👍 Reusing the DiskRange

That change should happen in this commit

raunaqmorarka

Can you drop the 3rd and 5th commits entirely from this PR ?
It's not clear to me that we are saving much there and it's complicating the overall changes. I think we could land the remaining part, which is more impactful, more easily by reducing changes in this PR.

raunaqmorarka · 2024-12-16T04:53:15Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

        MessageType messageType = readParquetSchema(schema);
        List<BlockMetadata> blocks = new ArrayList<>();
        List<RowGroup> rowGroups = fileMetaData.getRow_groups();
        if (rowGroups != null) {
            for (RowGroup rowGroup : rowGroups) {
                List<ColumnChunk> columns = rowGroup.getColumns();
-                validateParquet(!columns.isEmpty(), dataSourceId, "No columns in row group: %s", rowGroup);
+                checkState(!columns.isEmpty(), "No columns in row group: %s [%s]", rowGroup, dataSourceId);


raunaqmorarka · 2024-12-16T04:53:48Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

+        if (fileMetaData.getKey_value_metadata() == null) {
+            return ImmutableMap.of();
+        }
+        return fileMetaData.getKey_value_metadata().stream().collect(toMap(KeyValue::getKey, KeyValue::getValue));


toMap -> toImmutableMap(KeyValue::getKey, KeyValue::getValue, (first, second) -> second)

raunaqmorarka · 2024-12-16T05:06:49Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergParquetFileWriter.java

@@ -82,12 +82,12 @@ public FileMetrics getFileMetrics()
    {
        ParquetMetadata parquetMetadata;
        try {
-            parquetMetadata = ParquetMetadata.createParquetMetadata(parquetFileWriter.getFileMetadata(), new ParquetDataSourceId(location.toString()));
+            parquetMetadata = new ParquetMetadata(parquetFileWriter.getFileMetadata(), new ParquetDataSourceId(location.toString()));
+            return new FileMetrics(footerMetrics(parquetMetadata, Stream.empty(), metricsConfig), Optional.of(getSplitOffsets(parquetMetadata)));


raunaqmorarka · 2024-12-16T05:07:51Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

@@ -104,6 +123,9 @@ public List<BlockMetadata> getBlocks()
                            .map(value -> value.toLowerCase(Locale.ENGLISH))
                            .toArray(String[]::new);
                    ColumnPath columnPath = ColumnPath.get(path);
+                    if (!paths.isEmpty() && !paths.contains(columnPath)) {
+                        continue;


raunaqmorarka · 2024-12-18T08:04:04Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

@@ -104,6 +123,9 @@ public List<BlockMetadata> getBlocks()
                            .map(value -> value.toLowerCase(Locale.ENGLISH))
                            .toArray(String[]::new);
                    ColumnPath columnPath = ColumnPath.get(path);
+                    if (!paths.isEmpty() && !paths.contains(columnPath)) {


Upto this commit, it is used in io.trino.parquet.predicate.PredicateUtils#getFilteredRowGroups as well.
Regardless of current usage, we cannot risk exposing an incorrect value here.

raunaqmorarka · 2024-12-18T08:04:12Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java

+
+        return buildBlocks(paths);
+    }
+
    public List<BlockMetadata> getBlocks()


raunaqmorarka · 2024-12-20T06:59:39Z

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java


-    public ParquetMetadata(FileMetaData fileMetaData, ParquetDataSourceId dataSourceId)
+    public ParquetMetadata(FileMetaData fileMetaData, ParquetDataSourceId dataSourceId, Optional<Long> offset, Optional<Long> length)


That change should happen in this commit

raunaqmorarka · 2025-01-02T20:49:20Z

@jinyangli34 I've extracted some of the commits from here and cleaned them up in #24618

jinyangli34 · 2025-01-03T21:52:18Z

@jinyangli34 I've extracted some of the commits from here and cleaned them up in #24618

Thank you for taking care of this @raunaqmorarka !

cla-bot bot added the cla-signed label Nov 2, 2024

github-actions bot added hudi Hudi connector iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels Nov 2, 2024

jinyangli34 force-pushed the jinyang-optimize_parquet_footer_reader branch 2 times, most recently from 301055d to 1152756 Compare November 6, 2024 18:40

martint requested a review from raunaqmorarka November 19, 2024 16:24

martint added the performance label Nov 19, 2024

raunaqmorarka reviewed Nov 25, 2024

View reviewed changes

lib/trino-parquet/src/main/java/io/trino/parquet/metadata/ParquetMetadata.java Outdated Show resolved Hide resolved

jinyangli34 force-pushed the jinyang-optimize_parquet_footer_reader branch 2 times, most recently from 32ca37b to 2ca2cf8 Compare December 10, 2024 00:27

raunaqmorarka reviewed Dec 11, 2024

View reviewed changes

jinyangli34 force-pushed the jinyang-optimize_parquet_footer_reader branch from 2ca2cf8 to 041670a Compare December 12, 2024 19:32

jinyangli34 added 4 commits December 12, 2024 12:07

Move createParquetMetadata to ParquetMetadata

b5f0d0c

Parse parquet footer row groups lazily

e888f39

only get refereneced columns

a166f98

read row groups matching offset & length

e69f6b8

jinyangli34 force-pushed the jinyang-optimize_parquet_footer_reader branch 2 times, most recently from b86f019 to 1fb4ab3 Compare December 13, 2024 19:47

consolidate createPrunedColumnsMetadata and ParquetMetadata#buildBlocks

933ccdd

jinyangli34 force-pushed the jinyang-optimize_parquet_footer_reader branch from 1fb4ab3 to 933ccdd Compare December 14, 2024 01:20

raunaqmorarka reviewed Dec 20, 2024

View reviewed changes

raunaqmorarka mentioned this pull request Jan 2, 2025

Optimize parsing of parquet footers #24618

Merged

jinyangli34 closed this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize parquet footer reader #24007

optimize parquet footer reader #24007

jinyangli34 commented Nov 2, 2024 •

edited

Loading

findinpath commented Nov 19, 2024

jinyangli34 commented Nov 19, 2024

raunaqmorarka Dec 11, 2024

raunaqmorarka Dec 11, 2024

raunaqmorarka Dec 16, 2024

raunaqmorarka Dec 11, 2024

raunaqmorarka Dec 16, 2024

raunaqmorarka Dec 11, 2024

raunaqmorarka Dec 18, 2024

raunaqmorarka Dec 11, 2024

jinyangli34 Dec 12, 2024

raunaqmorarka Dec 18, 2024

raunaqmorarka Dec 11, 2024

jinyangli34 Dec 12, 2024

raunaqmorarka Dec 11, 2024

raunaqmorarka Dec 16, 2024

raunaqmorarka Dec 11, 2024

jinyangli34 Dec 12, 2024

raunaqmorarka Dec 20, 2024

raunaqmorarka left a comment

raunaqmorarka Dec 16, 2024

raunaqmorarka Dec 16, 2024

raunaqmorarka Dec 16, 2024

raunaqmorarka Dec 16, 2024

raunaqmorarka Dec 18, 2024

raunaqmorarka Dec 18, 2024

raunaqmorarka Dec 20, 2024

raunaqmorarka commented Jan 2, 2025

jinyangli34 commented Jan 3, 2025


		public ParquetMetadata(FileMetaData fileMetaData, ParquetDataSourceId dataSourceId)
		public ParquetMetadata(FileMetaData fileMetaData, ParquetDataSourceId dataSourceId, Optional<Long> offset, Optional<Long> length)

optimize parquet footer reader #24007

optimize parquet footer reader #24007

Conversation

jinyangli34 commented Nov 2, 2024 • edited Loading

Description

Additional context and related issues

Release notes

findinpath commented Nov 19, 2024

jinyangli34 commented Nov 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raunaqmorarka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raunaqmorarka commented Jan 2, 2025

jinyangli34 commented Jan 3, 2025

jinyangli34 commented Nov 2, 2024 •

edited

Loading