Skip to content

Tracking Issue for algebraic floating point methods #136469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 6 tasks
calder opened this issue Feb 3, 2025 · 37 comments
Open
2 of 6 tasks

Tracking Issue for algebraic floating point methods #136469

calder opened this issue Feb 3, 2025 · 37 comments
Labels
A-floating-point Area: Floating point numbers and arithmetic C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@calder
Copy link
Contributor

calder commented Feb 3, 2025

Feature gate: #![feature(float_algebraic)]

This is a tracking issue for exposing core::intrinsics::f*_algebraic in stable Rust, including in const.

Public API

// core::num::f16

impl f16 {
    pub const fn algebraic_add(self, rhs: f16) -> f16;
    pub const fn algebraic_sub(self, rhs: f16) -> f16;
    pub const fn algebraic_mul(self, rhs: f16) -> f16;
    pub const fn algebraic_div(self, rhs: f16) -> f16;
    pub const fn algebraic_rem(self, rhs: f16) -> f16;
}
// core::num::f32

impl f32 {
    pub const fn algebraic_add(self, rhs: f32) -> f32;
    pub const fn algebraic_sub(self, rhs: f32) -> f32;
    pub const fn algebraic_mul(self, rhs: f32) -> f32;
    pub const fn algebraic_div(self, rhs: f32) -> f32;
    pub const fn algebraic_rem(self, rhs: f32) -> f32;
}
// core::num::f64

impl f64 {
    pub const fn algebraic_add(self, rhs: f64) -> f64;
    pub const fn algebraic_sub(self, rhs: f64) -> f64;
    pub const fn algebraic_mul(self, rhs: f64) -> f64;
    pub const fn algebraic_div(self, rhs: f64) -> f64;
    pub const fn algebraic_rem(self, rhs: f64) -> f64;
}
// core::num::f128

impl f128 {
    pub const fn algebraic_add(self, rhs: f128) -> f128;
    pub const fn algebraic_sub(self, rhs: f128) -> f128;
    pub const fn algebraic_mul(self, rhs: f128) -> f128;
    pub const fn algebraic_div(self, rhs: f128) -> f128;
    pub const fn algebraic_rem(self, rhs: f128) -> f128;
}

Steps / History

Unresolved Questions

References

cc @rust-lang/lang @rust-lang/libs-api

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@calder calder added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Feb 3, 2025
@Noratrieb Noratrieb marked this as a duplicate of #136468 Feb 3, 2025
@jieyouxu jieyouxu added the A-floating-point Area: Floating point numbers and arithmetic label Feb 3, 2025
@tgross35 tgross35 added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Feb 5, 2025
@tgross35
Copy link
Contributor

tgross35 commented Feb 5, 2025

Added lang as requested in rust-lang/libs-team#532 (comment). That comment also mentions algebraic_mul_add, which could be added after the initial PR if it makes sense.

@RalfJung
Copy link
Member

These operations are non-deterministic. @nikic do you know if LLVM scalar evolution handles that properly? We had codegen issues in the past when SE assumed that an operation was deterministic and then actually it was not.

@RalfJung
Copy link
Member

RalfJung commented Feb 17, 2025

That comment also mentions algebraic_mul_add, which could be added after the initial PR if it makes sense.

We also have the "may or may not fuse" intrinsics added in #124874, which so far have not been exposed in any way that has a path to stabilization. Would algebraic_mul_add use those intrinsics, or would it replace them since we likely want some of the algebraic fast-math flags on that operation as well (i.e., we want to give the compiler more freedom than just "fuse or don't fuse").

@nikic
Copy link
Contributor

nikic commented Feb 17, 2025

These operations are non-deterministic. @nikic do you know if LLVM scalar evolution handles that properly? We had codegen issues in the past when SE assumed that an operation was deterministic and then actually it was not.

Are these "algebraic" in the sense that they have reassoc FMF? If so, then yes, SE should be treating them as non-deterministic already: https://github.com/llvm/llvm-project/blob/6684a5970e74b8b4c0c83361a90e25dae9646db0/llvm/lib/Analysis/ConstantFolding.cpp#L1437-L1444

@RalfJung
Copy link
Member

reassoc and a few more, yeah. Seems like that is accounted for, thanks. :)

@tgross35 tgross35 changed the title Tracking Issue for exposing algebraic floating point intrinsics Tracking Issue for algebraic floating point methods Feb 17, 2025
@tgross35
Copy link
Contributor

On that note I added an unresolved question for naming since algebraic isn't the most clear indicator of what is going on.

@tgross35
Copy link
Contributor

In theory the flags could also be represented via const generics, which would allow more fine tuned control and more flags without a method explosion. Something like:

#[derive(Clone, Copy, Debug, Default, ...)]
#[non_exhaustive]
struct FpArithOps {
    reassociate: bool,
    contract: bool,
    reciprocal: bool,
    no_signed_zeros: bool,
    ftz: bool,
    daz: bool,
    poison_nan: bool,
    poison_inf: bool,
}

impl FpArithOps {
    // Current algebraic_* flags
    const ALGEBRAIC: Self = Self { reassociate: true, contract: true, reciprocal: true, no_signed_zeros: true, ..false };
}

// Panics if `poison_*` is set
fn add_with_ops<const OPS: FpArithOps>(self, y: Self) -> Self;

// Alows `poison_*`
unsafe fn add_with_ops_unchecked<const OPS: FpArithOps>(self, y: Self) -> Self;

// same as `f32::algebraic_div`
let x = 1.0f32.div_with_ops::<FpArithOps::ALGEBRAIC>(y);

(Using a struct needs the next step of const generic support, or some way to bless types in std)

@hanna-kruppe
Copy link
Contributor

As I wrote in the ACP:

While this goes on the direction of having multiple combinations of LLVM “fast math flags” exposed, I don’t think that there’s more than two or three sets that are broadly useful enough and well-behaved enough to be worth exposing. And one of those is just “let the hardware do whatever is fast w.r.t. subnormals” which is something that really wants to be applied to entire regions of code and not to individual operations, so it may need an entirely different design. (It’s also a function attribute rather than an instruction flag in LLVM.)

I don’t think that we should expose the power set of LLVM’s flags, nor so many ad-hoc combinations that “method explosion” becomes a realistic problem. I’m not even sure if it’s a good idea to ever expose any of FMFs that can make an operation return poison (has anyone ever used those soundly while still getting a speedup?). And FTZ/DAZ shouldn’t be modeled as per-operation flags because you don’t want to toggle the CPU control registers for that constantly.

Zalathar added a commit to Zalathar/rust that referenced this issue Feb 18, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps
@zroug
Copy link

zroug commented Feb 18, 2025

Should @llvm.arithmetic.fence be exposed? Sometimes it would be nice to reason about the effects of algebraic operations locally. Currently, there is no clean way to prevent reassociation, etc. with operations outside your local context, because you can't know if the caller is using algebraic operations as well.

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 19, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 26, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
fmease added a commit to fmease/rust that referenced this issue Feb 26, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 26, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

~~try-job: x86_64-gnu-nopt~~
try-job: x86_64-gnu-aux
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 11, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 11, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 12, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
@orlp
Copy link
Contributor

orlp commented Mar 16, 2025

On that note I added an unresolved question for naming since algebraic isn't the most clear indicator of what is going on.

I think it is fairly clear. The operations allow algebraically justified optimizations, as-if the arithmetic was real arithmetic.

I don't think you're going to find a clearer name, but feel free to provide suggestions. One alternative one might consider is real_add, real_sub, etc.

bors added a commit to rust-lang-ci/rust that referenced this issue Apr 2, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 3, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 4, 2025
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us ✅

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us ✅

With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us ❌

With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* rust-lang#21690
* rust-lang/libs-team#532
* rust-lang#136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
@oli-obk
Copy link
Contributor

oli-obk commented Apr 12, 2025

Seems fine to me. If you're relying on floats being unoptimized in const eval in a frequently used crate then that's on you

@traviscross traviscross added the I-lang-nominated Nominated for discussion during a lang team meeting. label Apr 13, 2025
@traviscross
Copy link
Contributor

Seems OK to me. We of course should just document what people should not rely on here.

@scottmcm
Copy link
Member

Seems trivially-fine to allow them in const in nightly, at least.

I do think Ralf's concern is at least worth a conversation in stabilization discussions, but I think that'll probably also end up arising equivalently at runtime in various ways, so I don't think there's a need to block nightly experimentation as lang.

(Obviously if the maintenance burden is too high, or something, other may still block for other reasons.)

@nikomatsakis
Copy link
Contributor

No objections.

@traviscross
Copy link
Contributor

@rustbot labels -I-lang-nominated

We discussed this in the lang call today, and we had appetite for these being const.

@rustbot rustbot removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Apr 16, 2025
Zalathar added a commit to Zalathar/rust that referenced this issue Apr 24, 2025
…r=RalfJung

Make algebraic functions into `const fn` items.

Tracking issue: rust-lang#136469

This PR makes the algebraic intrinsics and the unstable, algebraic functions of `f16`, `f32`, `f64`, and `f128` into `const fn` items:

```rust
impl f16 {
    pub const fn algebraic_add(self, rhs: f16) -> f16;
    pub const fn algebraic_sub(self, rhs: f16) -> f16;
    pub const fn algebraic_mul(self, rhs: f16) -> f16;
    pub const fn algebraic_div(self, rhs: f16) -> f16;
    pub const fn algebraic_rem(self, rhs: f16) -> f16;
}

impl f32 {
    pub const fn algebraic_add(self, rhs: f32) -> f32;
    pub const fn algebraic_sub(self, rhs: f32) -> f32;
    pub const fn algebraic_mul(self, rhs: f32) -> f32;
    pub const fn algebraic_div(self, rhs: f32) -> f32;
    pub const fn algebraic_rem(self, rhs: f32) -> f32;
}

impl f64 {
    pub const fn algebraic_add(self, rhs: f64) -> f64;
    pub const fn algebraic_sub(self, rhs: f64) -> f64;
    pub const fn algebraic_mul(self, rhs: f64) -> f64;
    pub const fn algebraic_div(self, rhs: f64) -> f64;
    pub const fn algebraic_rem(self, rhs: f64) -> f64;
}

impl f128 {
    pub const fn algebraic_add(self, rhs: f128) -> f128;
    pub const fn algebraic_sub(self, rhs: f128) -> f128;
    pub const fn algebraic_mul(self, rhs: f128) -> f128;
    pub const fn algebraic_div(self, rhs: f128) -> f128;
    pub const fn algebraic_rem(self, rhs: f128) -> f128;
}

// core::intrinsics

pub const fn fadd_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fsub_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fmul_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fdiv_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn frem_algebraic<T: Copy>(a: T, b: T) -> T;
```

This PR does not preserve the initial behaviour of these functions yielding non-deterministic output under Miri; it is most likely desired to reimplement this behaviour at some point.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Apr 24, 2025
…r=RalfJung

Make algebraic functions into `const fn` items.

Tracking issue: rust-lang#136469

This PR makes the algebraic intrinsics and the unstable, algebraic functions of `f16`, `f32`, `f64`, and `f128` into `const fn` items:

```rust
impl f16 {
    pub const fn algebraic_add(self, rhs: f16) -> f16;
    pub const fn algebraic_sub(self, rhs: f16) -> f16;
    pub const fn algebraic_mul(self, rhs: f16) -> f16;
    pub const fn algebraic_div(self, rhs: f16) -> f16;
    pub const fn algebraic_rem(self, rhs: f16) -> f16;
}

impl f32 {
    pub const fn algebraic_add(self, rhs: f32) -> f32;
    pub const fn algebraic_sub(self, rhs: f32) -> f32;
    pub const fn algebraic_mul(self, rhs: f32) -> f32;
    pub const fn algebraic_div(self, rhs: f32) -> f32;
    pub const fn algebraic_rem(self, rhs: f32) -> f32;
}

impl f64 {
    pub const fn algebraic_add(self, rhs: f64) -> f64;
    pub const fn algebraic_sub(self, rhs: f64) -> f64;
    pub const fn algebraic_mul(self, rhs: f64) -> f64;
    pub const fn algebraic_div(self, rhs: f64) -> f64;
    pub const fn algebraic_rem(self, rhs: f64) -> f64;
}

impl f128 {
    pub const fn algebraic_add(self, rhs: f128) -> f128;
    pub const fn algebraic_sub(self, rhs: f128) -> f128;
    pub const fn algebraic_mul(self, rhs: f128) -> f128;
    pub const fn algebraic_div(self, rhs: f128) -> f128;
    pub const fn algebraic_rem(self, rhs: f128) -> f128;
}

// core::intrinsics

pub const fn fadd_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fsub_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fmul_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fdiv_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn frem_algebraic<T: Copy>(a: T, b: T) -> T;
```

This PR does not preserve the initial behaviour of these functions yielding non-deterministic output under Miri; it is most likely desired to reimplement this behaviour at some point.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Apr 24, 2025
Rollup merge of rust-lang#140172 - bjoernager:const-float-algebraic, r=RalfJung

Make algebraic functions into `const fn` items.

Tracking issue: rust-lang#136469

This PR makes the algebraic intrinsics and the unstable, algebraic functions of `f16`, `f32`, `f64`, and `f128` into `const fn` items:

```rust
impl f16 {
    pub const fn algebraic_add(self, rhs: f16) -> f16;
    pub const fn algebraic_sub(self, rhs: f16) -> f16;
    pub const fn algebraic_mul(self, rhs: f16) -> f16;
    pub const fn algebraic_div(self, rhs: f16) -> f16;
    pub const fn algebraic_rem(self, rhs: f16) -> f16;
}

impl f32 {
    pub const fn algebraic_add(self, rhs: f32) -> f32;
    pub const fn algebraic_sub(self, rhs: f32) -> f32;
    pub const fn algebraic_mul(self, rhs: f32) -> f32;
    pub const fn algebraic_div(self, rhs: f32) -> f32;
    pub const fn algebraic_rem(self, rhs: f32) -> f32;
}

impl f64 {
    pub const fn algebraic_add(self, rhs: f64) -> f64;
    pub const fn algebraic_sub(self, rhs: f64) -> f64;
    pub const fn algebraic_mul(self, rhs: f64) -> f64;
    pub const fn algebraic_div(self, rhs: f64) -> f64;
    pub const fn algebraic_rem(self, rhs: f64) -> f64;
}

impl f128 {
    pub const fn algebraic_add(self, rhs: f128) -> f128;
    pub const fn algebraic_sub(self, rhs: f128) -> f128;
    pub const fn algebraic_mul(self, rhs: f128) -> f128;
    pub const fn algebraic_div(self, rhs: f128) -> f128;
    pub const fn algebraic_rem(self, rhs: f128) -> f128;
}

// core::intrinsics

pub const fn fadd_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fsub_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fmul_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fdiv_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn frem_algebraic<T: Copy>(a: T, b: T) -> T;
```

This PR does not preserve the initial behaviour of these functions yielding non-deterministic output under Miri; it is most likely desired to reimplement this behaviour at some point.
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue May 9, 2025
…r=RalfJung

Make algebraic functions into `const fn` items.

Tracking issue: rust-lang#136469

This PR makes the algebraic intrinsics and the unstable, algebraic functions of `f16`, `f32`, `f64`, and `f128` into `const fn` items:

```rust
impl f16 {
    pub const fn algebraic_add(self, rhs: f16) -> f16;
    pub const fn algebraic_sub(self, rhs: f16) -> f16;
    pub const fn algebraic_mul(self, rhs: f16) -> f16;
    pub const fn algebraic_div(self, rhs: f16) -> f16;
    pub const fn algebraic_rem(self, rhs: f16) -> f16;
}

impl f32 {
    pub const fn algebraic_add(self, rhs: f32) -> f32;
    pub const fn algebraic_sub(self, rhs: f32) -> f32;
    pub const fn algebraic_mul(self, rhs: f32) -> f32;
    pub const fn algebraic_div(self, rhs: f32) -> f32;
    pub const fn algebraic_rem(self, rhs: f32) -> f32;
}

impl f64 {
    pub const fn algebraic_add(self, rhs: f64) -> f64;
    pub const fn algebraic_sub(self, rhs: f64) -> f64;
    pub const fn algebraic_mul(self, rhs: f64) -> f64;
    pub const fn algebraic_div(self, rhs: f64) -> f64;
    pub const fn algebraic_rem(self, rhs: f64) -> f64;
}

impl f128 {
    pub const fn algebraic_add(self, rhs: f128) -> f128;
    pub const fn algebraic_sub(self, rhs: f128) -> f128;
    pub const fn algebraic_mul(self, rhs: f128) -> f128;
    pub const fn algebraic_div(self, rhs: f128) -> f128;
    pub const fn algebraic_rem(self, rhs: f128) -> f128;
}

// core::intrinsics

pub const fn fadd_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fsub_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fmul_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn fdiv_algebraic<T: Copy>(a: T, b: T) -> T;
pub const fn frem_algebraic<T: Copy>(a: T, b: T) -> T;
```

This PR does not preserve the initial behaviour of these functions yielding non-deterministic output under Miri; it is most likely desired to reimplement this behaviour at some point.
@tczajka
Copy link

tczajka commented May 13, 2025

I don't think you're going to find a clearer name, but feel free to provide suggestions. One alternative one might consider is real_add, real_sub, etc.

These names suggest that the operations are more accurate than normal, where really they are less accurate. One might misinterpret that these are infinite-precision operations (perhaps with rounding after a whole sequence of operations).

The actual meaning isn't that these are real number operations, it's quite the opposite: they have best-effort precision with no strict guarantees.

I find "algebraic" confusing for the same reason.

How about approximate_add, approximate_sub?

@traviscross
Copy link
Contributor

traviscross commented May 13, 2025

Saying "approximate" feels imperfect, as while these operations don't promise to produce the exact IEEE result on a per-operation basis, the overall result might well be more accurate algebraically. E.g.:

let mut x = 0.0f32;
for _ in 0..100000000 {
    x = x.algebraic_add(0.00000001);
}
println!("algebraic:    {x}");
let mut x = 0.0f32;
for _ in 0..100000000 {
    x += 0.00000001;
}
println!("exact per op: {x}");

Produces (on release mode):

algebraic:    1.0061039
exact per op: 0.25

(The algebraically correct result is 1.0.)

Playground link

Of course, there's no guarantee that it will be. It's just that we've allowed "algebraically justifiable" optimizations to happen.

@hanna-kruppe
Copy link
Contributor

I’m not necessarily opposed to “approximate” but I have some quibbles with it too. While there are no guarantees about precision, in practice the most important optimizations these methods enable (reassociation, mul-add contraction) are often neutral to beneficial for precision. To me, the “approximate” name sounds more like what LLVM’s afn flag enables (substitute individual operations with faster but less precise implementations) which is not quite the same as the purpose of these methods.

@orlp
Copy link
Contributor

orlp commented May 13, 2025

How about approximate_add, approximate_sub?

I strongly disagree with this naming. With the few exceptions of exactly representable input/output pairs, almost all floating point math is already an approximation. And it would imply optimizations that aren't permissible.

These names suggest that the operations are more accurate than normal, where really they are less accurate.

Whether or not they are more or less accurate depends on the exact expression and input domain. For example, .fold(0.0, f64::algebraic_add) is more accurate than .sum::<f64>() for virtually all inputs. In fact, if we so desired Rust/LLVM could in theory incorporate an optimization pass similar to Herbie which chooses the fastest most accurate alternative, something that would be impossible with normal operations which do not allow rewriting at all.

@bjoernager

This comment has been minimized.

@orlp
Copy link
Contributor

orlp commented May 13, 2025

@bjoernager There are more optimizations involved in these algebraic ops than simple reassociation.

@bjoernager
Copy link
Contributor

bjoernager commented May 13, 2025

There had been some discussion about removing the f*_fast intrinsics in favour of these. We could merge the functions so that we use the fast name but with the current algebraic semantics. This would mostly (but not exactly) be equivalent to GCC's or Clang's -ffast-math option, and it could also be seen as more intuitive coming from those.

@tczajka
Copy link

tczajka commented May 13, 2025

To justify my "approximate" naming proposal:

IEEE-754 specifies the exact semantics of +, -, *, /, sqrt and that semantics is followed precisely to the last bit by the regular arithmetic operations.

The "algebraic" operations are "approximate" in the sense that they are not guaranteed to follow the IEEE-754 specification exactly.

@traviscross
Copy link
Contributor

traviscross commented May 13, 2025

Not in direct reply to anything specifically here, but for our reference, the flags we're turning on are (quoting from the LLVM docs):

  • nsz - No Signed Zeros - Allow optimizations to treat the sign of a zero argument or zero result as insignificant. This does not imply that -0.0 is poison and/or guaranteed to not exist in the operation.
  • arcp - Allows division to be treated as a multiplication by a reciprocal. Specifically, this permits a / b to be considered equivalent to a * (1.0 / b) (which may subsequently be susceptible to code motion), and it also permits a / (b / c) to be considered equivalent to a * (c / b). Both of these rewrites can be applied in either direction: a * (c / b) can be rewritten into a / (b / c).
  • contract - Allow floating-point contraction (e.g. fusing a multiply followed by an addition into a fused multiply-and-add). This does not enable reassociation to form arbitrary contractions. For example, (a*b) + (c*d) + e can not be transformed into (a*b) + ((c*d) + e) to create two fma operations.
  • reassoc - Allow algebraically equivalent transformations for floating-point instructions such as reassociation transformations. This may dramatically change results in floating-point.

We're not turning on:

  • afn - Approximate functions - Allow substitution of approximate calculations for functions (sin, log, sqrt, etc). See floating-point intrinsic definitions for places where this can apply to LLVM’s intrinsic math functions.

Nor are we turning on these flags, which would be implied (along with nsz) by fast (in LLVM):

  • nnan - No NaNs - Allow optimizations to assume the arguments and result are not NaN. If an argument is a nan, or the result would be a nan, it produces a poison value instead.
  • ninf - No Infs - Allow optimizations to assume the arguments and result are not +/-Inf. If an argument is +/-Inf, or the result would be +/-Inf, it produces a poison value instead.

@tczajka
Copy link

tczajka commented May 13, 2025

We're not turning on:

  • afn - Approximate functions - Allow substitution of approximate calculations for functions (sin, log, sqrt, etc). See floating-point intrinsic definitions for places where this can apply to LLVM’s intrinsic math functions.

This would be a relevant consideration if there was algebraic_sin, algebraic_sqrt, etc, but seems immaterial to algebraic_add and friends.

@traviscross
Copy link
Contributor

It's mentioned just for reference, in light of the point @hanna-kruppe made in #136469 (comment).

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 13, 2025

@bjoernager The “fast math” terminology (also -Ofast) has proven to be a huge disservice to users and the Rust project should not propagate it further. If you call something the “fast” version, everyone’s like “of course I want my code to be fast” and that’s not something to encourage even with how relatively tame these operations are compared to the “funsafe-math” chimera.

@bjoernager
Copy link
Contributor

bjoernager commented May 13, 2025

I don't see why "fast" must inherently also equate to "exact" or "precise." The whole point of these functions is to optimise for execution speed, not accuracy. Just like I can run "fast" with a basket but I might drop some stuff from it, I can also take it slow and assure that its contents will be kept intact. Anyone that reads the docs would know these complications of fast arithmetic, and I don't see a reason to idiot-proof the language for people that won't bother... After all, there is a plethora of other pitfalls that can lead to junk logic or calculations etc. (which in and of itself does not break any safety guarantees).

@tczajka
Copy link

tczajka commented May 13, 2025

I agree that fast is a better name than algebraic.

"Algebraic" can mean many things. "Algebraic numbers" are roots of polynomials, so √2 is algebraic and π is not an algebraic number. There is also a related concept of "algebraic function". It's obviously not what is meant here.

The name "algebraic" tries to suggest what kind of optimizations are possible on this, but names shouldn't concentrate on what optimizations are allowed, they should talk about the semantics of the operation, and optimizations follow from that.

It's hard for me to interpret "algebraic" in any semantic way. real would be even worse, because that's just semantically wrong. The optimizer might be "assuming" these are exact operations on real numbers, but that's just its own internal incorrect assumption, no reason to expose that in the names.

I still think approximate is even better. fast concentrates on performance, approximate concentrates on the semantics that implicitly enables better performance.

I think the root of the naming issue is that there is no clearly defined semantics: "The exact set of optimizations is unspecified". So it's really kind of best_effort_add.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 13, 2025

@bjoernager Naming shouldn’t be constrained by the worst hypothetical misunderstanding someone can invent. But with this exact name, in this exact domain, we have decades of experience showing that it leads to misunderstandings and misuse by many users. And I don’t want to just blame users for it: it’s a maximally misleading name because it focuses on the aspect that needs the least attention. Sure, you can carefully interpret it as “fast as a trade-off against some other desirable property” but that’s not where most people’s minds go first. Especially not in Rust, which has “efficiency” right in its tagline and prides itself on not trading off reliability for it.

@tgross35
Copy link
Contributor

I believe fast has been discussed in a number of threads and the point that kept coming up is what Hanna mentioned, that we don't want to direct users toward thinking "if these are fast, everything else must be slow; why would I use them?".

How about approximate_add, approximate_sub?

I strongly disagree with this naming. With the few exceptions of exactly representable input/output pairs, almost all floating point math is already an approximation. And it would imply optimizations that aren't permissible.

The response in #136469 (comment) resonates with me as well here. "approximate" would be relative to the default operations, which are as exact as possible given finite precision. I don't think there is any risk of a confusing implication that default operations act with infinite precision. That said, the logic holds less well if we wind up with something like approximate_sin since libm functions like sin tend to be inexact even within the available precision.

Personally I do like inexact_* a bit more than approximate_*, since IMO it connotes a smaller tolerance. We could also go the direction of naming based on what the compiler is allowed to do, e.g. foldable_*.

@tczajka
Copy link

tczajka commented May 13, 2025

Personally I do like inexact_* a bit more than approximate_*, since IMO it connotes a smaller tolerance.

inexact is also reasonable.

But I'm not sure how much tolerance is reasonable to imply because even a simple re-association such as (a+b)+c => a+(b+c) can in some cases cause 100% precision loss. It's essentially impossible to build anything reliable on these functions without having any bounds in the spec on the re-writings that can happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests