-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
Reading a Parquet file that contains an empty DataPage v2 fails with snappy: corrupt input (empty).
Such a page occurs when all values are null.
To Reproduce
Writing a Spark dataset that contains only null values in one column using v2 Parquet writer:
./spark-3.5.5-bin-hadoop3/bin/spark-shell --conf spark.hadoop.parquet.writer.version="v2"
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.5.5
/_/
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 11.0.26)
Type in expressions to have them evaluated.
Type :help for more information.
scala> Seq(Option.empty[Float]).toDS.write.parquet("parquet-v2-example.parquet")
Expected behavior
The Parquet file should be read.
Additional context
The issue is identical to this Apache Arrow issue: apache/arrow#22459
The fix is identical to Apache Arrow fix: apache/arrow#45252