Skip to content

Performance regression for HistoryBuf(fer)::write() with heapless 0.9.1 #598

@BVollmerhaus

Description

@BVollmerhaus

Description

I've been using heapless's HistoryBuffer (now renamed to HistoryBuf) as a sliding search window with much success. More precisely, I have an iterator over a substantial number of bytes and need to find a particular sequence, so I'm doing something akin to the following:

let mut search_window: HistoryBuf<u8, 11> = HistoryBuf::new();

for byte in lots_of_bytes {
    search_window.write(byte);

    if search_window.oldest_ordered().eq(b"some-needle") {
        // Needle found; do something useful
        return;
    }
}

While I'm not using heapless in its intended embedded use case, I found its HistoryBuffer to be much faster than using a VecDeque with a manual pop_front()/push_back() or similar alternatives.

However, with heapless 0.9.1, there seems to be a significant regression in the performance of the above snippet, which I'd largely attribute to the write() call. Of course, the oldest_ordered().eq() in every iteration has a proportionately larger impact on runtime than the write() on its own, but just upgrading heapless to 0.9.1 with this snippet still results in about 400 MB/s less throughput in a real-world scenario (~1.4 → ~1.0 GB/s).

Different optimization settings like LTO or codegen units do not seem to affect the result much. You can find a Divan-based benchmark as an MRE below. This may very well be a case of me holding it wrong, but I figured it would be worth reporting nonetheless.

Environment

  • CPU: Ryzen 7 PRO 5850U

  • Kernel Version: 6.16.1

  • rustc -vV

    rustc 1.89.0 (29483883e 2025-08-04)
    binary: rustc
    commit-hash: 29483883eed69d5fb4db01964cdf2af4d86e9cb2
    commit-date: 2025-08-04
    host: x86_64-unknown-linux-gnu
    release: 1.89.0
    LLVM version: 20.1.7
    

MRE Benchmark

// benches/historybuf.rs

use std::hint::black_box;

use divan::counter::BytesCount;
use heapless::HistoryBuf;  // Replace with `HistoryBuffer` for heapless 0.8.0

const NEEDLE: &[u8] = b"needle451";
const WINDOW_SIZE: usize = NEEDLE.len();

fn historybuf_search_window_write(data: Vec<u8>) {
    let mut search_window: HistoryBuf<u8, WINDOW_SIZE> = HistoryBuf::new();

    for byte in data {
        search_window.write(byte);
    }
}

#[divan::bench]
fn bench_historybuf_search_window_write(bencher: divan::Bencher) {
    let total_bytes: usize = 8 * 1024 * 1024;
    let bytes_before_needle = total_bytes - NEEDLE.len();

    bencher
        .counter(BytesCount::new(total_bytes))
        .with_inputs(|| {
            let mut data = Vec::with_capacity(total_bytes);
            data.extend(vec![b'\0'; bytes_before_needle]);
            data.extend(NEEDLE);
            data
        })
        .bench_values(|data| {
            historybuf_search_window_write(black_box(data));
        });
}

fn main() {
    divan::main();
}
# Cargo.toml

[package]
name = "heapless-perf-mre"
edition = "2024"

[dependencies]
heapless = "0.9.1"

[dev-dependencies]
divan = "0.1.21"

[profile.release]
lto = "fat"
codegen-units = 1

[[bench]]
name = "historybuf"
harness = false

Results

cargo bench --benches

heapless 0.8.0 (HistoryBuffer)

     Running benches/historybuf.rs (target/release/deps/historybuf-2a85391490864b67)
Timer precision: 20 ns
historybuf                               fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ bench_historybuf_search_window_write  24.61 µs      │ 275.3 µs      │ 119.5 µs      │ 124.4 µs      │ 100     │ 100
                                         340.7 GB/s    │ 30.46 GB/s    │ 70.15 GB/s    │ 67.37 GB/s    │         │

heapless 0.9.1 (HistoryBuf)

     Running benches/historybuf.rs (target/release/deps/historybuf-974f73ca8d39caa1)
Timer precision: 20 ns
historybuf                               fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ bench_historybuf_search_window_write  6.823 ms      │ 7.078 ms      │ 6.87 ms       │ 6.892 ms      │ 100     │ 100
                                         1.229 GB/s    │ 1.185 GB/s    │ 1.22 GB/s     │ 1.217 GB/s    │         │

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions