19
19
20
20
# Upgrade Guides
21
21
22
+ ## DataFusion ` 47.0.0 `
23
+
24
+ This section calls out some of the major changes in the ` 47.0.0 ` release of DataFusion.
25
+
26
+ Here are some example upgrade PRs that demonstrate changes required when upgrading from DataFusion 46.0.0:
27
+
28
+ - [ delta-rs Upgrade to ` 47.0.0 ` ] ( https://github.com/delta-io/delta-rs/pull/3378 )
29
+ - [ DataFusion Comet Upgrade to ` 47.0.0 ` ] ( https://github.com/apache/datafusion-comet/pull/1563 )
30
+ - [ Sail Upgrade to ` 47.0.0 ` ] ( https://github.com/lakehq/sail/pull/434 )
31
+
32
+ ### Upgrades to ` arrow-rs ` and ` arrow-parquet ` 55.0.0 and ` object_store ` 0.12.0
33
+
34
+ Several APIs are changed in the underlying arrow and parquet libraries to use a
35
+ ` u64 ` instead of ` usize ` to better support WASM (See [ #7371 ] and [ #6961 ] )
36
+
37
+ Additionally ` ObjectStore::list ` and ` ObjectStore::list_with_offset ` have been changed to return ` static ` lifetimes (See [ #6619 ] )
38
+
39
+ [ #6619 ] : https://github.com/apache/arrow-rs/pull/6619
40
+ [ #7371 ] : https://github.com/apache/arrow-rs/pull/7371
41
+ [ #7328 ] : https://github.com/apache/arrow-rs/pull/6961
42
+
43
+ This requires converting from ` usize ` to ` u64 ` occasionally as well as changes to ` ObjectStore ` implementations such as
44
+
45
+ ``` rust
46
+ # /* comment to avoid running
47
+ impl Objectstore {
48
+ ...
49
+ // The range is now a u64 instead of usize
50
+ async fn get_range(&self, location: &Path, range: Range<u64>) -> ObjectStoreResult<Bytes> {
51
+ self.inner.get_range(location, range).await
52
+ }
53
+ ...
54
+ // the lifetime is now 'static instead of `_ (meaning the captured closure can't contain references)
55
+ // (this also applies to list_with_offset)
56
+ fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, ObjectStoreResult<ObjectMeta>> {
57
+ self.inner.list(prefix)
58
+ }
59
+ }
60
+ # */
61
+ ```
62
+
63
+ The ` ParquetObjectReader ` has been updated to no longer require the object size
64
+ (it can be fetched using a single suffix request). See [ #7334 ] for details
65
+
66
+ [ #7334 ] : https://github.com/apache/arrow-rs/pull/7334
67
+
68
+ Pattern in DataFusion ` 46.0.0 ` :
69
+
70
+ ``` rust
71
+ # /* comment to avoid running
72
+ let meta: ObjectMeta = ...;
73
+ let reader = ParquetObjectReader::new(store, meta);
74
+ # */
75
+ ```
76
+
77
+ Pattern in DataFusion ` 47.0.0 ` :
78
+
79
+ ``` rust
80
+ # /* comment to avoid running
81
+ let meta: ObjectMeta = ...;
82
+ let reader = ParquetObjectReader::new(store, location)
83
+ .with_file_size(meta.size);
84
+ # */
85
+ ```
86
+
87
+ ### ` DisplayFormatType::TreeRender `
88
+
89
+ DataFusion now supports [ ` tree ` style explain plans] . Implementations of
90
+ ` Executionplan ` must also provide a description in the
91
+ ` DisplayFormatType::TreeRender ` format. This can be the same as the existing
92
+ ` DisplayFormatType::Default ` .
93
+
94
+ [ `tree` style explain plans ] : https://datafusion.apache.org/user-guide/sql/explain.html#tree-format-default
95
+
96
+ ### Removed Deprecated APIs
97
+
98
+ Several APIs have been removed in this release. These were either deprecated
99
+ previously or were hard to use correctly such as the multiple different
100
+ ` ScalarUDFImpl::invoke* ` APIs. See [ #15130 ] , [ #15123 ] , and [ #15027 ] for more
101
+ details.
102
+
103
+ [ #15130 ] : https://github.com/apache/datafusion/pull/15130
104
+ [ #15123 ] : https://github.com/apache/datafusion/pull/15123
105
+ [ #15027 ] : https://github.com/apache/datafusion/pull/15027
106
+
107
+ ## ` FileScanConfig ` --> ` FileScanConfigBuilder `
108
+
109
+ Previously, ` FileScanConfig::build() ` directly created ExecutionPlans. In
110
+ DataFusion 47.0.0 this has been changed to use ` FileScanConfigBuilder ` . See
111
+ [ #15352 ] for details.
112
+
113
+ [ #15352 ] : https://github.com/apache/datafusion/pull/15352
114
+
115
+ Pattern in DataFusion ` 46.0.0 ` :
116
+
117
+ ``` rust
118
+ # /* comment to avoid running
119
+ let plan = FileScanConfig::new(url, schema, Arc::new(file_source))
120
+ .with_statistics(stats)
121
+ ...
122
+ .build()
123
+ # */
124
+ ```
125
+
126
+ Pattern in DataFusion ` 47.0.0 ` :
127
+
128
+ ``` rust
129
+ # /* comment to avoid running
130
+ let config = FileScanConfigBuilder::new(url, schema, Arc::new(file_source))
131
+ .with_statistics(stats)
132
+ ...
133
+ .build();
134
+ let scan = DataSourceExec::from_data_source(config);
135
+ # */
136
+ ```
137
+
22
138
## DataFusion ` 46.0.0 `
23
139
24
140
### Use ` invoke_with_args ` instead of ` invoke() ` and ` invoke_batch() `
@@ -39,7 +155,7 @@ below. See [PR 14876] for an example.
39
155
Given existing code like this:
40
156
41
157
``` rust
42
- # /*
158
+ # /* comment to avoid running
43
159
impl ScalarUDFImpl for SparkConcat {
44
160
...
45
161
fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) -> Result<ColumnarValue> {
@@ -59,7 +175,7 @@ impl ScalarUDFImpl for SparkConcat {
59
175
To
60
176
61
177
``` rust
62
- # /* comment out so they don't run
178
+ # /* comment to avoid running
63
179
impl ScalarUDFImpl for SparkConcat {
64
180
...
65
181
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
0 commit comments