Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 10 additions & 45 deletions parquet/src/bin/parquet-rewrite.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,48 +47,6 @@ use parquet::{
},
};

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum, Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point of this structure is to provide documentation. With your change help now looks like:

% ./target/debug/parquet-rewrite --help
Read and write parquet file with potentially different settings

Usage: parquet-rewrite [OPTIONS] --input <INPUT> --output <OUTPUT>

Options:
  -i, --input <INPUT>
          Path to input parquet file

  -o, --output <OUTPUT>
          Path to output parquet file

      --compression <COMPRESSION>
          Compression used for all columns

where before the compression help was:

      --compression <COMPRESSION>
          Compression used

          Possible values:
          - none:    No compression
          - snappy:  Snappy
          - gzip:    GZip
          - lzo:     LZO
          - brotli:  Brotli
          - lz4:     LZ4
          - zstd:    Zstd
          - lz4-raw: LZ4 Raw

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, clap might already handle this well?

enum CompressionArgs {
/// No compression.
None,

/// Snappy
Snappy,

/// GZip
Gzip,

/// LZO
Lzo,

/// Brotli
Brotli,

/// LZ4
Lz4,

/// Zstd
Zstd,

/// LZ4 Raw
Lz4Raw,
}

impl From<CompressionArgs> for Compression {
fn from(value: CompressionArgs) -> Self {
match value {
CompressionArgs::None => Self::UNCOMPRESSED,
CompressionArgs::Snappy => Self::SNAPPY,
CompressionArgs::Gzip => Self::GZIP(Default::default()),
CompressionArgs::Lzo => Self::LZO,
CompressionArgs::Brotli => Self::BROTLI(Default::default()),
CompressionArgs::Lz4 => Self::LZ4,
CompressionArgs::Zstd => Self::ZSTD(Default::default()),
CompressionArgs::Lz4Raw => Self::LZ4_RAW,
}
}
}

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum, Debug)]
enum EncodingArgs {
/// Default byte encoding.
Expand Down Expand Up @@ -216,8 +174,8 @@ struct Args {
output: String,

/// Compression used for all columns.
#[clap(long, value_enum)]
compression: Option<CompressionArgs>,
#[clap(long)]
compression: Option<Compression>,
Copy link
Member Author

@mapleFU mapleFU Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, default level would be 1. But now, zstd would be 3 by default. I can also revert this and add a "compression-level" integer config. Both is ok for me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow, won't the default level still be 1? Or is it that now one must provide the level on the command line as --compression zstd(3)?


/// Encoding used for all columns, if dictionary is not enabled.
#[clap(long, value_enum)]
Expand Down Expand Up @@ -286,6 +244,10 @@ struct Args {
#[clap(long)]
writer_version: Option<WriterVersionArgs>,

/// Sets write batch size.
#[clap(long)]
write_batch_size: Option<usize>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without batch-size, a small data_page_row_count_limit might not works


/// Sets whether to coerce Arrow types to match Parquet specification
#[clap(long)]
coerce_types: Option<bool>,
Expand Down Expand Up @@ -314,7 +276,7 @@ fn main() {

let mut writer_properties_builder = WriterProperties::builder().set_key_value_metadata(kv_md);
if let Some(value) = args.compression {
writer_properties_builder = writer_properties_builder.set_compression(value.into());
writer_properties_builder = writer_properties_builder.set_compression(value);
}

// setup encoding
Expand Down Expand Up @@ -382,6 +344,9 @@ fn main() {
if let Some(value) = args.coerce_types {
writer_properties_builder = writer_properties_builder.set_coerce_types(value);
}
if let Some(value) = args.write_batch_size {
writer_properties_builder = writer_properties_builder.set_write_batch_size(value);
}
let writer_properties = writer_properties_builder.build();
let mut parquet_writer = ArrowWriter::try_new(
File::create(&args.output).expect("Unable to open output file"),
Expand Down
Loading