Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

Consider compiling with newer CPU flags #29

Open
Dandandan opened this issue Feb 22, 2022 · 5 comments
Open

Consider compiling with newer CPU flags #29

Dandandan opened this issue Feb 22, 2022 · 5 comments

Comments

@Dandandan
Copy link
Contributor

Rust by default compiles towards a very old architecture, which limit the performance of the.

We should probably update this with a newer
An example of Polars usage:

https://github.com/pola-rs/polars/blob/master/.github/deploy_manylinux.sh#L11

There are a bit of stats over here:

https://store.steampowered.com/hwsurvey

SSE2100.00%
SSE3100.00%
LAHF / SAHF99.99%
CMPXCHG16B99.98%
SSSE399.27%
SSE4.198.89%
SSE4.298.50%
FCMOV97.23%
NTFS96.06%
AES95.50%
AVX94.38%
AVX286.31%

I think we could maybe enable all features up to avx2 and AES. AES is in use by ahash which will improve performance in hash joins and hash aggregates. Other features improve overall performance, e.g. in kernels, parquet reader, and DataFusion code.

@matthewmturner
Copy link
Contributor

ill play with these flags locally and keep you posted on impact

@matthewmturner
Copy link
Contributor

@Dandandan

I've done the following to build the wheel:

export RUSTFLAGS='-C target-feature=+fxsr,+sse,+sse2,+sse3,+ssse3,+sse4.1+sse4.2,+popcnt,+aes,+avx,+avx2' && maturin build --release

Then i just reinstalled the wheel and reran the benchmark which produced the following:

q1: 0.043521209000000116
q2: 0.4907338750000001
q3: 2.0281409170000004
q4: 0.03750329200000024
q5: 2.112818584
q6: 2.1120300420000007
q7: 2.0400456249999994
q8: 3.093032082999999
q9: 2.1041081250000016
q10: 50.334135208999996

These results were basically in line with the unoptimized build so im wondering if ive done something wrong.

any thoughts?

@matthewmturner
Copy link
Contributor

@realno FYI

@houqp
Copy link
Member

houqp commented Feb 23, 2022

When I tried target-cpu=skylake for roapi, i got 10-20% speed improvements. Just as a quick test, do you get any performance gain with target-cpu=native?

@matthewmturner
Copy link
Contributor

below is with native and sn-malloc - some faster, some slower. roughly in line.

q1: 0.05099512500000003
q2: 0.3307659999999999
q3: 1.228696541
q4: 0.062102542000000316
q5: 1.2268319589999996
q6: 1.2571589580000002
q7: 1.1611415420000002
q8: 2.9696968339999996
q9: 0.6929859999999994
q10: 20.191931167

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants