Skip to content

v3.11.0: Checksums in AVX-512, AVX2, NEON

Compare
Choose a tag to compare
@ashvardanian ashvardanian released this 01 Dec 10:11
· 0 commits to d52854884c1ddf0556a5a31d7473650205c2ff8f since this release
  • 🆕 sz_checksum(char const *, size_t) C 99 interface
  • 🆕 sz::str().checksum() C++ 11 interface
  • 🆕 sz.checksum(str) Python interface

Database and other Systems Engineers, you can now use StringZilla to dynamically dispatch different check-sum kernels for AVX2 capable Haswell+ CPUs, AVX-512BW capable Ice Lake+ CPUs, and Arm NEON CPUs on mobile. In AVX-512, masked loads are used extensively, resulting in a 10% improvement even on typical English words, averaging 5 bytes in length and 20x performance improvement compared to the serial code for longer strings.

On the technical side, on x86, the kernels use the well-known SAD(text, zeros) idiom to accumulate absolute differences between individual bytes into 64-bit words. It also uses bidirectional traversal to saturate the core, capable of performing 2 loads per CPU cycle. Moreover, on large inputs, it switches to streaming loads, separately handling the head and the tail, similar to our memcpy alternative, also outperforming LibC on AVX-512-capable machines 😎

Minor

Patch

  • Docs: Simpler Python doc-strings (ad5fa2c)
  • Fix: sz_checksum visibility (9bec0eb)
  • Fix: Missing _mm_cvtsi128_si64x in Clang (c8c6c7c)