Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 4 additions & 10 deletions parquet/src/arrow/arrow_reader/read_plan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -110,19 +110,13 @@ impl ReadPlanBuilder {
None => return RowSelectionStrategy::Selectors,
};

let trimmed = selection.clone().trim();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this clone shows up in the traces for some of the clickbench queries. I think this is because there are 9000 selectors for these queries

let selectors: Vec<RowSelector> = trimmed.into();
if selectors.is_empty() {
return RowSelectionStrategy::Mask;
}

let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
let selector_count = selectors.len();
if selector_count == 0 {
let non_empty_selector_count = selection.iter().filter(|s| s.row_count > 0).count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement! I didn’t notice this in my benchmarks, likely because the dataset was too small.
Small suggestion:

  • non_empty_selector_count is too long, effective_count could sound better.
  • Using fold to replace 2 seperate iteration might be better.
let (total_rows, effective_count) = selection.iter()
    .fold((0, 0), |(rows, count), s| {
        if s.row_count > 0 {
            (rows + s.row_count, count + 1)
        } else {
            (rows, count)
        }
    });

if effective_count == 0 {
    return RowSelectionStrategy::Mask;
}

if total_rows < effective_count.saturating_mul(threshold) {
    RowSelectionStrategy::Mask
} else {
    RowSelectionStrategy::Selectors
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea -- done in fe10c50

let total_rows: usize = selection.iter().map(|s| s.row_count).sum();
if non_empty_selector_count == 0 {
return RowSelectionStrategy::Mask;
}

if total_rows < selector_count.saturating_mul(threshold) {
if total_rows < non_empty_selector_count.saturating_mul(threshold) {
RowSelectionStrategy::Mask
} else {
RowSelectionStrategy::Selectors
Expand Down
Loading