Skip to content

chore: fix clippy::large_enum_variant for DataFusionError #15861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions datafusion/common/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ pub enum DataFusionError {
ParquetError(ParquetError),
/// Error when reading Avro data.
#[cfg(feature = "avro")]
AvroError(AvroError),
AvroError(Box<AvroError>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering why now the error needs to be boxed? 🤔

Copy link
Contributor Author

@rroelke rroelke Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the specific numbers but here's a verbose explanation of what the lint is trying to tell us about:

enum MyErrorType {
    Something,
    SomethingElse([u8; 4096])
}

We can see here that the size of MyErrorType is 4096 (ish)

fn try_thing() -> Result<usize, MyErrorType> {
    ...
}

Consequently the size of Result<T, MyErrorType> is always at least 4096. Result<usize, MyErrorType> has size 4096. To call try_thing we have to allocate memory to hold that size of result. We don't return the usize in a register. And most of the time we expect to see Ok rather than Err so this makes the common case a lot worse.

Boxing the large variants fixes the issue since it reduces the size of the variant to a pointer, which fits in a register.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we box other error type too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd favor that. There might be another error variant which is less than the clippy threshold size but is still large enough to prevent certain optimizations. If all the error variants were boxed then we could be confident that DataFusionError could be copied around in registers and then perhaps Result<T, DataFusionError> also could be for small T.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally suggest avoiding additional API churn (aka boxing all varaints) unless there is some particular problem we are trying to solve or improve performance in some way we can measure

/// Error when reading / writing to / from an object_store (e.g. S3 or LocalFile)
#[cfg(feature = "object_store")]
ObjectStore(object_store::Error),
Expand Down Expand Up @@ -311,7 +311,7 @@ impl From<ParquetError> for DataFusionError {
#[cfg(feature = "avro")]
impl From<AvroError> for DataFusionError {
fn from(e: AvroError) -> Self {
DataFusionError::AvroError(e)
DataFusionError::AvroError(Box::new(e))
}
}

Expand Down
6 changes: 2 additions & 4 deletions datafusion/datasource-avro/src/avro_to_arrow/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ use apache_avro::types::Value;
use apache_avro::Schema as AvroSchema;
use arrow::datatypes::{DataType, IntervalUnit, Schema, TimeUnit, UnionMode};
use arrow::datatypes::{Field, UnionFields};
use datafusion_common::error::{DataFusionError, Result};
use datafusion_common::error::Result;
use std::collections::HashMap;
use std::sync::Arc;

Expand Down Expand Up @@ -107,9 +107,7 @@ fn schema_to_field_with_props(
.data_type()
.clone()
} else {
return Err(DataFusionError::AvroError(
apache_avro::Error::GetUnionDuplicate,
));
return Err(apache_avro::Error::GetUnionDuplicate.into());
}
} else {
let fields = sub_schemas
Expand Down