Skip to content

feat: upgrade to DataFusion 47.0.0 #3378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 1, 2025

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 11, 2025

Description

upgrade to DataFusion 47.0.0

Related Issue(s)

As in the past, I am making a PR to upgrade to DataFusion 47 to both:

  1. help delta.rs
  2. to test DataFusion 47 before we release it

Documentation

Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Apr 11, 2025
@alamb alamb force-pushed the alamb/update_datafusion_47 branch from b5e7d70 to ef86d80 Compare April 11, 2025 19:52
@alamb alamb changed the title WIP: Upgrade to DataFusion 47.0.0 feat: Upgrade to DataFusion 47.0.0 Apr 11, 2025
@alamb alamb force-pushed the alamb/update_datafusion_47 branch from ef86d80 to ff11e08 Compare April 11, 2025 19:54
@alamb alamb changed the title feat: Upgrade to DataFusion 47.0.0 feat: upgrade to DataFusion 47.0.0 Apr 11, 2025
@ion-elgreco
Copy link
Collaborator

I believe DF47 will resolve:

And maybe this one: #3339?

@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2025

I will leave some comments about the changes in this PR about what made them necessary

@alamb alamb force-pushed the alamb/update_datafusion_47 branch 3 times, most recently from 8f5f785 to e430e36 Compare April 17, 2025 00:26
@Nordalf
Copy link
Contributor

Nordalf commented Apr 30, 2025

@alamb - I can see you released DF47 to Crates - and it has been merged in delta-kernel. Can we expect this one to be merged soon? I would love to have v0.12.0 of object_store to get the 'static lifetime on the list traits 🚀

@ion-elgreco ion-elgreco marked this pull request as ready for review April 30, 2025 12:33
Copy link

codecov bot commented Apr 30, 2025

Codecov Report

Attention: Patch coverage is 74.10714% with 29 lines in your changes missing coverage. Please review.

Project coverage is 72.01%. Comparing base (1f5d8d8) to head (837ead8).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/delta_datafusion/mod.rs 54.54% 5 Missing ⚠️
python/src/filesystem.rs 0.00% 4 Missing ⚠️
crates/aws/src/storage.rs 0.00% 3 Missing ⚠️
crates/core/src/logstore/storage/runtime.rs 0.00% 3 Missing ⚠️
crates/gcp/src/storage.rs 0.00% 3 Missing ⚠️
crates/lakefs/src/logstore.rs 0.00% 3 Missing ⚠️
crates/core/src/kernel/snapshot/serde.rs 50.00% 2 Missing ⚠️
crates/mount/src/file.rs 0.00% 2 Missing ⚠️
crates/core/src/delta_datafusion/cdf/scan_utils.rs 0.00% 0 Missing and 1 partial ⚠️
crates/core/src/operations/convert_to_delta.rs 75.00% 0 Missing and 1 partial ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3378      +/-   ##
==========================================
+ Coverage   71.95%   72.01%   +0.06%     
==========================================
  Files         148      148              
  Lines       46095    46082      -13     
  Branches    46095    46082      -13     
==========================================
+ Hits        33167    33187      +20     
+ Misses      10817    10791      -26     
+ Partials     2111     2104       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@alamb alamb force-pushed the alamb/update_datafusion_47 branch from 83c651f to 6195604 Compare April 30, 2025 22:39
@@ -370,15 +370,15 @@ impl ObjectStore for S3StorageBackend {
self.inner.delete(location).await
}

fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> {
fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, ObjectStoreResult<ObjectMeta>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -358,7 +358,7 @@ impl ObjectStore for S3StorageBackend {
self.inner.get_opts(location, options).await
}

async fn get_range(&self, location: &Path, range: Range<usize>) -> ObjectStoreResult<Bytes> {
async fn get_range(&self, location: &Path, range: Range<u64>) -> ObjectStoreResult<Bytes> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -50,15 +50,6 @@ impl UserDefinedLogicalNodeCore for MetricObserver {
write!(f, "MetricObserver id={}", self.id)
}

fn from_template(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -671,7 +673,7 @@ impl<'a> DeltaScanBuilder<'a> {
}
};

let file_scan_config = FileScanConfig::new(
let file_scan_config = FileScanConfigBuilder::new(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -300,7 +300,8 @@ impl LogSegment {
let store = store.clone();
let read_schema = read_schema.clone();
async move {
let mut reader = ParquetObjectReader::new(store, meta);
let mut reader =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

impl SchemaMapper for SchemaMapping {
fn map_batch(&self, batch: RecordBatch) -> datafusion_common::Result<RecordBatch> {
let record_batch = cast_record_batch(&batch, self.projected_schema.clone(), false, true)?;
Ok(record_batch)
}

fn map_partial_batch(&self, batch: RecordBatch) -> datafusion_common::Result<RecordBatch> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1099,8 +1100,11 @@ pub(super) mod zorder {
Ok(DataType::Binary)
}

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb
Copy link
Contributor Author

alamb commented Apr 30, 2025

Sorry I forgot to submit some comments on this PR. I have also merged up and removed the cargo patches now that datafusion 47 is released.

@alamb
Copy link
Contributor Author

alamb commented Apr 30, 2025

@alamb - I can see you released DF47 to Crates - and it has been merged in delta-kernel. Can we expect this one to be merged soon? I would love to have v0.12.0 of object_store to get the 'static lifetime on the list traits 🚀

Thanks -- sorry @Nordalf -- I tried to polish this PR up and pushed a new version without any patches

Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo! still needs resolution in the hdfs crate, will take a look

@Kimahriman
Copy link
Contributor

todo! still needs resolution in the hdfs crate, will take a look

Should be a simple version bump, just let me know if there's any issues

@alamb alamb force-pushed the alamb/update_datafusion_47 branch from 6195604 to 5f9ff3d Compare May 1, 2025 00:45
@alamb
Copy link
Contributor Author

alamb commented May 1, 2025

todo! still needs resolution in the hdfs crate, will take a look

Should be a simple version bump, just let me know if there's any issues

I just force pushed a change to fix the hdfs thing

As @Kimahriman said, it just needed a version update. Very easy

@rtyler rtyler force-pushed the alamb/update_datafusion_47 branch from a03c975 to 837ead8 Compare May 1, 2025 01:38
@rtyler rtyler enabled auto-merge May 1, 2025 01:39
Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb thank you so much for taking the time to put this together. Everything looks good to me and I believe CI will now 💚 this change, so let's land it and celebrate

@alamb
Copy link
Contributor Author

alamb commented May 1, 2025

@alamb thank you so much for taking the time to put this together. Everything looks good to me and I believe CI will now 💚 this change, so let's land it and celebrate

Thank you guys! I owe you a beer sometime. Maybe if you are around the Data and AI summit

@rtyler rtyler added this pull request to the merge queue May 1, 2025
Merged via the queue into delta-io:main with commit 131a6e2 May 1, 2025
29 checks passed
@Nordalf
Copy link
Contributor

Nordalf commented May 1, 2025

@alamb - I can see you released DF47 to Crates - and it has been merged in delta-kernel. Can we expect this one to be merged soon? I would love to have v0.12.0 of object_store to get the 'static lifetime on the list traits 🚀

Thanks -- sorry @Nordalf -- I tried to polish this PR up and pushed a new version without any patches

@alamb - no need to apologize to me! I am more than happy this got merged so fast 😄. B-e-autiful job!

@alamb alamb deleted the alamb/update_datafusion_47 branch May 5, 2025 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants