Skip to content

Commit 9e85280

Browse files
committed
Update results to datafusion 46
1 parent 05d1bdd commit 9e85280

File tree

5 files changed

+104
-104
lines changed

5 files changed

+104
-104
lines changed

datafusion/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,37 +2,37 @@
22

33
DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format. For more information, please check <https://arrow.apache.org/datafusion/user-guide/introduction.html>
44

5-
We use parquet file here and create an external table for it; and then do the queries.
5+
We use parquet file here and create an external table for it; and then execute the queries.
66

77
## Generate benchmark results
88

99
The benchmark should be completed in under an hour. On-demand pricing is $0.6 per hour while spot pricing is only $0.2 to $0.3 per hour (us-east-2).
1010

1111
1. manually start a AWS EC2 instance
1212
- `c6a.4xlarge`
13-
- Amazon Linux 2 AMI
13+
- Ubuntu 22.04 or later
1414
- Root 500GB gp2 SSD
1515
- no EBS optimized
1616
- no instance store
17-
1. wait for status check passed, then ssh to EC2 `ssh ec2-user@{ip}`
18-
1. `sudo yum update -y` and `sudo yum install gcc git -y`
17+
1. wait for status check passed, then ssh to EC2 `ssh ubuntu@{ip}`
1918
1. `git clone https://github.com/ClickHouse/ClickBench`
2019
1. `cd ClickBench/datafusion`
2120
1. `vi benchmark.sh` and modify following line to target Datafusion version
21+
22+
```bash
23+
git checkout 46.0.0
2224
```
23-
git checkout 45.0.0
24-
```
25+
2526
1. `bash benchmark.sh`
2627

27-
### Know Issues:
28+
### Know Issues
2829

2930
1. importing parquet by `datafusion-cli` doesn't support schema, need to add some casting in queries.sql (e.g. converting EventTime from Int to Timestamp via `to_timestamp_seconds`)
3031
2. importing parquet by `datafusion-cli` make column name column name case-sensitive, i change all column name in queries.sql to double quoted literal (e.g. `EventTime` -> `"EventTime"`)
3132
3. `comparing binary with utf-8` and `group by binary` don't work in mac, if you run these queries in mac, you'll get some errors for queries contain binary format apache/arrow-datafusion#3050
3233
33-
3434
## Generate full human readable results (for debugging)
3535
3636
1. install datafusion-cli
3737
2. download the parquet ```wget --continue https://datasets.clickhouse.com/hits_compatible/hits.parquet```
38-
3. execute it ```datafusion-cli -f create.sh queries.sh``` or ```bash run2.sh```
38+
3. execute it ```datafusion-cli -f create_single.sql queries.sql``` or ```bash run2.sh```

datafusion/benchmark.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ sudo apt-get install --yes gcc
1111

1212
echo "Install DataFusion main branch"
1313
git clone https://github.com/apache/arrow-datafusion.git
14-
cd arrow-datafusion/datafusion-cli
15-
git checkout 45.0.0
16-
CARGO_PROFILE_RELEASE_LTO=true RUSTFLAGS="-C codegen-units=1" cargo build --release
14+
cd arrow-datafusion/
15+
git checkout 46.0.0
16+
CARGO_PROFILE_RELEASE_LTO=true RUSTFLAGS="-C codegen-units=1" cargo build --release --package datafusion-cli --bin datafusion-cli
1717
export PATH="`pwd`/target/release:$PATH"
1818
cd ../..
1919

datafusion/results/partitioned.json

Lines changed: 46 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,55 @@
11
{
22
"system": "DataFusion (Parquet, partitioned)",
3-
"date": "2024-03-29",
3+
"date": "2024-04-18",
44
"machine": "c6a.4xlarge, 500gb gp2",
55
"cluster_size": 1,
6-
"comment": "v45.0.0 (26058ac)",
6+
"comment": "v46.0.0 (d5ca830)",
77
"tags": ["Rust", "column-oriented", "embedded", "stateless"],
88
"load_time": 0,
9-
"data_size": 14779976446,
9+
"data_size": 14737666736,
1010
"result": [
11-
[0.060, 0.022, 0.021],
12-
[0.109, 0.034, 0.035],
13-
[0.198, 0.085, 0.083],
14-
[0.391, 0.088, 0.084],
15-
[1.143, 0.846, 0.872],
16-
[1.020, 0.856, 0.855],
17-
[0.086, 0.032, 0.028],
18-
[0.118, 0.037, 0.037],
19-
[1.102, 0.962, 0.942],
20-
[1.353, 1.070, 1.045],
21-
[0.487, 0.260, 0.263],
22-
[0.663, 0.291, 0.286],
23-
[1.114, 0.893, 0.901],
24-
[2.596, 1.410, 1.360],
25-
[1.133, 0.860, 0.854],
26-
[1.132, 1.020, 1.001],
27-
[2.668, 1.835, 1.866],
28-
[2.557, 1.694, 1.704],
29-
[5.337, 3.714, 3.794],
30-
[0.263, 0.082, 0.082],
31-
[9.891, 1.109, 1.125],
32-
[11.284, 1.331, 1.348],
33-
[21.820, 2.617, 2.631],
34-
[55.448, 9.609, 9.630],
35-
[2.687, 0.452, 0.453],
36-
[0.804, 0.368, 0.364],
37-
[2.704, 0.517, 0.520],
38-
[9.662, 1.553, 1.507],
39-
[9.988, 9.801, 9.769],
40-
[0.526, 0.421, 0.403],
41-
[2.371, 0.802, 0.812],
42-
[5.944, 0.904, 0.903],
43-
[4.827, 3.645, 3.565],
44-
[10.196, 3.767, 3.792],
45-
[10.234, 3.823, 3.844],
46-
[1.397, 1.270, 1.303],
47-
[0.328, 0.146, 0.147],
48-
[0.196, 0.085, 0.105],
49-
[0.328, 0.147, 0.150],
50-
[0.482, 0.220, 0.219],
51-
[0.198, 0.076, 0.076],
52-
[0.189, 0.088, 0.076],
53-
[0.179, 0.064, 0.075]
11+
[0.062, 0.021, 0.021],
12+
[0.120, 0.037, 0.036],
13+
[0.212, 0.084, 0.084],
14+
[0.388, 0.089, 0.082],
15+
[1.037, 0.875, 0.879],
16+
[1.002, 0.863, 0.864],
17+
[0.082, 0.033, 0.032],
18+
[0.117, 0.039, 0.038],
19+
[1.120, 0.950, 0.961],
20+
[1.319, 1.056, 1.058],
21+
[0.509, 0.267, 0.261],
22+
[0.612, 0.290, 0.289],
23+
[1.105, 0.919, 0.922],
24+
[2.552, 1.418, 1.415],
25+
[1.100, 0.859, 0.877],
26+
[1.155, 1.016, 1.018],
27+
[2.650, 1.852, 1.855],
28+
[2.553, 1.718, 1.738],
29+
[5.359, 3.667, 3.813],
30+
[0.263, 0.084, 0.085],
31+
[9.892, 1.113, 1.136],
32+
[11.274, 1.352, 1.349],
33+
[21.836, 2.640, 2.605],
34+
[55.533, 9.673, 9.740],
35+
[2.690, 0.462, 0.462],
36+
[0.812, 0.381, 0.368],
37+
[2.697, 0.539, 0.533],
38+
[9.607, 1.547, 1.526],
39+
[10.160, 9.670, 9.757],
40+
[0.530, 0.432, 0.430],
41+
[2.388, 0.810, 0.836],
42+
[5.962, 0.921, 0.946],
43+
[4.766, 3.673, 3.741],
44+
[10.197, 3.789, 3.811],
45+
[10.207, 3.851, 3.917],
46+
[1.392, 1.259, 1.251],
47+
[0.331, 0.148, 0.150],
48+
[0.229, 0.085, 0.085],
49+
[0.323, 0.146, 0.162],
50+
[0.479, 0.222, 0.233],
51+
[0.215, 0.079, 0.079],
52+
[0.201, 0.073, 0.074],
53+
[0.182, 0.065, 0.064]
5454
]
5555
}

datafusion/results/single.json

Lines changed: 45 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,55 @@
11
{
22
"system": "DataFusion (Parquet, single)",
3-
"date": "2024-03-29",
3+
"date": "2024-04-18",
44
"machine": "c6a.4xlarge, 500gb gp2",
55
"cluster_size": 1,
6-
"comment": "v45.0.0 (26058ac)",
6+
"comment": "v46.0.0 (d5ca830)",
77
"tags": ["Rust", "column-oriented", "embedded", "stateless"],
88
"load_time": 0,
99
"data_size": 14779976446,
1010
"result": [
11-
[0.103, 0.063, 0.070],
12-
[0.136, 0.080, 0.079],
13-
[0.219, 0.117, 0.116],
14-
[0.348, 0.128, 0.128],
15-
[1.045, 0.913, 0.924],
16-
[1.102, 0.959, 0.964],
17-
[0.114, 0.066, 0.078],
18-
[0.152, 0.088, 0.087],
19-
[1.160, 1.021, 1.005],
20-
[1.324, 1.077, 1.110],
21-
[0.470, 0.297, 0.297],
22-
[0.567, 0.317, 0.312],
23-
[1.140, 0.984, 0.984],
24-
[2.681, 1.388, 1.459],
25-
[1.106, 0.952, 0.939],
26-
[1.185, 1.062, 1.067],
27-
[2.647, 1.943, 1.937],
28-
[2.524, 1.787, 1.787],
29-
[5.212, 3.749, 3.825],
30-
[0.272, 0.115, 0.122],
31-
[9.741, 1.205, 1.190],
32-
[11.298, 1.552, 1.497],
33-
[22.086, 3.670, 3.620],
34-
[55.936, 10.118, 10.120],
35-
[2.553, 0.572, 0.591],
36-
[0.792, 0.519, 0.512],
37-
[2.561, 0.639, 0.634],
38-
[9.600, 1.650, 1.682],
39-
[10.898, 10.343, 10.278],
40-
[0.556, 0.455, 0.459],
41-
[2.282, 0.938, 0.932],
42-
[5.685, 1.033, 1.025],
43-
[4.576, 3.773, 3.780],
44-
[10.309, 3.906, 3.927],
45-
[10.317, 3.969, 4.025],
46-
[1.395, 1.251, 1.253],
47-
[0.364, 0.202, 0.199],
48-
[0.284, 0.163, 0.164],
49-
[0.385, 0.216, 0.198],
50-
[0.541, 0.302, 0.295],
51-
[0.224, 0.115, 0.111],
52-
[0.215, 0.108, 0.111],
53-
[0.193, 0.102, 0.100]
11+
[0.103, 0.067, 0.060],
12+
[0.137, 0.083, 0.074],
13+
[0.228, 0.116, 0.118],
14+
[0.341, 0.124, 0.122],
15+
[1.033, 0.907, 0.936],
16+
[1.097, 0.981, 0.982],
17+
[0.118, 0.086, 0.083],
18+
[0.150, 0.082, 0.081],
19+
[1.141, 1.053, 1.024],
20+
[1.312, 1.103, 1.145],
21+
[0.483, 0.297, 0.300],
22+
[0.577, 0.330, 0.323],
23+
[1.157, 1.007, 1.004],
24+
[2.772, 1.420, 1.393],
25+
[1.115, 0.959, 0.973],
26+
[1.177, 1.081, 1.070],
27+
[2.626, 1.971, 1.971],
28+
[2.513, 1.802, 1.812],
29+
[5.277, 3.894, 3.846],
30+
[0.260, 0.124, 0.122],
31+
[9.730, 1.190, 1.210],
32+
[11.254, 1.453, 1.484],
33+
[22.102, 3.639, 3.597],
34+
[55.988, 10.171, 10.220],
35+
[2.557, 0.589, 0.577],
36+
[0.811, 0.536, 0.518],
37+
[2.574, 0.652, 0.657],
38+
[9.604, 1.652, 1.635],
39+
[10.793, 10.454, 10.628],
40+
[0.572, 0.448, 0.480],
41+
[2.283, 0.933, 0.961],
42+
[5.690, 1.038, 1.046],
43+
[4.496, 3.738, 3.841],
44+
[10.294, 3.946, 3.940],
45+
[10.197, 3.949, 4.030],
46+
[1.416, 1.289, 1.272],
47+
[0.382, 0.195, 0.212],
48+
[0.274, 0.184, 0.168],
49+
[0.369, 0.200, 0.196],
50+
[0.533, 0.296, 0.285],
51+
[0.243, 0.111, 0.111],
52+
[0.216, 0.112, 0.127],
53+
[0.193, 0.103, 0.102]
5454
]
5555
}

datafusion/run.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ cat queries.sql | while read -r query; do
3131
# 2. each query contains a "Query took xxx seconds", we just grep these 2 lines
3232
# 3. use sed to take the second line
3333
# 4. use awk to take the number we want
34-
RES=`datafusion-cli -f $CREATE_SQL_FILE /tmp/query.sql 2>&1 | grep "Elapsed" |sed -n 2p | awk '{ print $2 }'`
34+
RES=$(datafusion-cli -f $CREATE_SQL_FILE /tmp/query.sql 2>&1 | grep "Elapsed" |sed -n 2p | awk '{ print $2 }')
3535
[[ $RES != "" ]] && \
3636
echo -n "$RES" || \
3737
echo -n "null"

0 commit comments

Comments
 (0)