Skip to content

Commit 66a7423

Browse files
alambcomphead
andauthored
Add DataFusion 47.0.0 Upgrade Guide (#15749)
* Add DataFusion 47.0.0 Upgrade Guide * prettier * Update docs/source/library-user-guide/upgrading.md Co-authored-by: Oleks V <[email protected]> * Update docs/source/library-user-guide/upgrading.md Co-authored-by: Oleks V <[email protected]> * Fix examples * Try and fix tests again --------- Co-authored-by: Oleks V <[email protected]>
1 parent ee5bf0a commit 66a7423

File tree

1 file changed

+118
-2
lines changed

1 file changed

+118
-2
lines changed

docs/source/library-user-guide/upgrading.md

Lines changed: 118 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,122 @@
1919

2020
# Upgrade Guides
2121

22+
## DataFusion `47.0.0`
23+
24+
This section calls out some of the major changes in the `47.0.0` release of DataFusion.
25+
26+
Here are some example upgrade PRs that demonstrate changes required when upgrading from DataFusion 46.0.0:
27+
28+
- [delta-rs Upgrade to `47.0.0`](https://github.com/delta-io/delta-rs/pull/3378)
29+
- [DataFusion Comet Upgrade to `47.0.0`](https://github.com/apache/datafusion-comet/pull/1563)
30+
- [Sail Upgrade to `47.0.0`](https://github.com/lakehq/sail/pull/434)
31+
32+
### Upgrades to `arrow-rs` and `arrow-parquet` 55.0.0 and `object_store` 0.12.0
33+
34+
Several APIs are changed in the underlying arrow and parquet libraries to use a
35+
`u64` instead of `usize` to better support WASM (See [#7371] and [#6961])
36+
37+
Additionally `ObjectStore::list` and `ObjectStore::list_with_offset` have been changed to return `static` lifetimes (See [#6619])
38+
39+
[#6619]: https://github.com/apache/arrow-rs/pull/6619
40+
[#7371]: https://github.com/apache/arrow-rs/pull/7371
41+
[#7328]: https://github.com/apache/arrow-rs/pull/6961
42+
43+
This requires converting from `usize` to `u64` occasionally as well as changes to `ObjectStore` implementations such as
44+
45+
```rust
46+
# /* comment to avoid running
47+
impl Objectstore {
48+
...
49+
// The range is now a u64 instead of usize
50+
async fn get_range(&self, location: &Path, range: Range<u64>) -> ObjectStoreResult<Bytes> {
51+
self.inner.get_range(location, range).await
52+
}
53+
...
54+
// the lifetime is now 'static instead of `_ (meaning the captured closure can't contain references)
55+
// (this also applies to list_with_offset)
56+
fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, ObjectStoreResult<ObjectMeta>> {
57+
self.inner.list(prefix)
58+
}
59+
}
60+
# */
61+
```
62+
63+
The `ParquetObjectReader` has been updated to no longer require the object size
64+
(it can be fetched using a single suffix request). See [#7334] for details
65+
66+
[#7334]: https://github.com/apache/arrow-rs/pull/7334
67+
68+
Pattern in DataFusion `46.0.0`:
69+
70+
```rust
71+
# /* comment to avoid running
72+
let meta: ObjectMeta = ...;
73+
let reader = ParquetObjectReader::new(store, meta);
74+
# */
75+
```
76+
77+
Pattern in DataFusion `47.0.0`:
78+
79+
```rust
80+
# /* comment to avoid running
81+
let meta: ObjectMeta = ...;
82+
let reader = ParquetObjectReader::new(store, location)
83+
.with_file_size(meta.size);
84+
# */
85+
```
86+
87+
### `DisplayFormatType::TreeRender`
88+
89+
DataFusion now supports [`tree` style explain plans]. Implementations of
90+
`Executionplan` must also provide a description in the
91+
`DisplayFormatType::TreeRender` format. This can be the same as the existing
92+
`DisplayFormatType::Default`.
93+
94+
[`tree` style explain plans]: https://datafusion.apache.org/user-guide/sql/explain.html#tree-format-default
95+
96+
### Removed Deprecated APIs
97+
98+
Several APIs have been removed in this release. These were either deprecated
99+
previously or were hard to use correctly such as the multiple different
100+
`ScalarUDFImpl::invoke*` APIs. See [#15130], [#15123], and [#15027] for more
101+
details.
102+
103+
[#15130]: https://github.com/apache/datafusion/pull/15130
104+
[#15123]: https://github.com/apache/datafusion/pull/15123
105+
[#15027]: https://github.com/apache/datafusion/pull/15027
106+
107+
## `FileScanConfig` --> `FileScanConfigBuilder`
108+
109+
Previously, `FileScanConfig::build()` directly created ExecutionPlans. In
110+
DataFusion 47.0.0 this has been changed to use `FileScanConfigBuilder`. See
111+
[#15352] for details.
112+
113+
[#15352]: https://github.com/apache/datafusion/pull/15352
114+
115+
Pattern in DataFusion `46.0.0`:
116+
117+
```rust
118+
# /* comment to avoid running
119+
let plan = FileScanConfig::new(url, schema, Arc::new(file_source))
120+
.with_statistics(stats)
121+
...
122+
.build()
123+
# */
124+
```
125+
126+
Pattern in DataFusion `47.0.0`:
127+
128+
```rust
129+
# /* comment to avoid running
130+
let config = FileScanConfigBuilder::new(url, schema, Arc::new(file_source))
131+
.with_statistics(stats)
132+
...
133+
.build();
134+
let scan = DataSourceExec::from_data_source(config);
135+
# */
136+
```
137+
22138
## DataFusion `46.0.0`
23139

24140
### Use `invoke_with_args` instead of `invoke()` and `invoke_batch()`
@@ -39,7 +155,7 @@ below. See [PR 14876] for an example.
39155
Given existing code like this:
40156

41157
```rust
42-
# /*
158+
# /* comment to avoid running
43159
impl ScalarUDFImpl for SparkConcat {
44160
...
45161
fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) -> Result<ColumnarValue> {
@@ -59,7 +175,7 @@ impl ScalarUDFImpl for SparkConcat {
59175
To
60176

61177
```rust
62-
# /* comment out so they don't run
178+
# /* comment to avoid running
63179
impl ScalarUDFImpl for SparkConcat {
64180
...
65181
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {

0 commit comments

Comments
 (0)