Skip to content

Reading empty DataPageV2 fails with snappy: corrupt input (empty) #7388

@EnricoMi

Description

@EnricoMi

Describe the bug
Reading a Parquet file that contains an empty DataPage v2 fails with snappy: corrupt input (empty).
Such a page occurs when all values are null.

To Reproduce
Writing a Spark dataset that contains only null values in one column using v2 Parquet writer:

./spark-3.5.5-bin-hadoop3/bin/spark-shell --conf spark.hadoop.parquet.writer.version="v2"
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.5
      /_/
         
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 11.0.26)
Type in expressions to have them evaluated.
Type :help for more information.

scala> Seq(Option.empty[Float]).toDS.write.parquet("parquet-v2-example.parquet")

Expected behavior
The Parquet file should be read.

Additional context
The issue is identical to this Apache Arrow issue: apache/arrow#22459
The fix is identical to Apache Arrow fix: apache/arrow#45252

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions