Skip to content

Releases: ashvardanian/StringZilla

Release v4.2.3

27 Oct 16:20

Choose a tag to compare

Release: v4.2.3 [skip ci]

Patch

  • Fix: Missing bounds checks in Rust (#273) (5219a4d)
  • Fix: Type-casting UBs of movemask bitsets (7c42b98)
  • Fix: Handling a larger order array (32b6350)
  • Fix: head_length is pre-decremented to zero (1c5c7e8)
  • Fix: Avoid std::enable_if for non-STL builds (568d90c)
  • Fix: Lifetime of temp strings in ranges (73ce811)

Release v4.2.2

26 Oct 21:24

Choose a tag to compare

Release: v4.2.2 [skip ci]

Patch

  • Improve: LUTs in SVE (3d886d3)
  • Make: Linux cross-compile matching Release CI (524b0d7)
  • Fix: Check for Arm Neon support on windows (30320b7)
  • Make: Removed pyarrow from windows arms python tests (eab8c3c)
  • Make: Exclude KERNEL32.dll from stringzilla_bare checks (9edb804)
  • Make: Disabled SVE when using MSVC (04c985b)
  • Make: Use correct arch on windows for stringzillas/cuda (3fcd947)
  • Make: Updated target arch for windows tests. (e6460e1)
  • Fix: Disable windows min/max macros (00e902f)
  • Fix: Replace processthreadsapi.h with windows.h (f09e4f9)
  • Make: Expand CMAKE_HOST_SYSTEM_PROCESSOR (fe09f8d)
  • Make: Revert --sysroot cross-compile commands (579c82d)
  • Fix: Accessing ARM64_CNTVCT on Windows (5e6777d)
  • Make: Avoid redefining arch=armv8.2-a in pragmas (636147d)
  • Make: Expand CMAKE_HOST_SYSTEM_PROCESSOR (1f90f6c)
  • Make: Link to libc++ in LLVM builds on MacOS (1c8b29b)
  • Make: Revert _M_ARM64=1 flags for MSVC (25311a6)
  • Make: Enable Posix extensions for Python builds (9fe4f7c)
  • Make: Missing macros for winnt.h(169) C1189 error (8ef98a9)
  • Fix: Reading mrs w/out inline Asm on MSVC (d804c9f)
  • Make: Override --sysroot for "Cross Compile" builds (d3d901d)
  • Make: Use valid arch flags on MSVC (5aba122)
  • Make: Cross compile checks now correct for MSVC (7664f67)
  • Make: Windows arm now uses the correct compiler (7c2e9a0)
  • Make: cmake set ARCHIVE_OUTPUT_DIRECTORY to binary dir (f1ec210)
  • Make: Use ninja for windows deploy builds (0af43c8)
  • Make: Fixed Windows deploy (8ff2ad7)
  • Make: Include experimental Arm cross-compilation (4d86312)

v4.2.1: SHA-256 for JS, Swift, Go 🫆

12 Oct 13:50

Choose a tag to compare

Exposing SHA-256 to GoLang was tricky. Clang worked fine. GCC failed. It turned out that GCC was too shy about inlining my code, resulting in excessive stack space usage... Now, JavaScript, Swift, and GoLang bindings all support incremental SHA-256 procedures 🥳. Thanks to @MarekKnapek for reducing the stack memory usage of the serial SHA variant!

Moreover, thanks to @laurenspriem for highlighting the SIGILL when probing ID registers on older Arm CPUs. I've now guarded first mrs probes with signal handlers. Ugly solution, but it may work 😅 I've also improved the capability detection code on Arm-based Windows machines, using the OS-specific <processthreadsapi.h> functionality, so now not only pure NEON, but also NEON+SHA+AES kernels, should be dispatched just fine!

Thanks to @ashbob999, StringZilla is also getting more stable Windows builds and stringzilla_bare coverage in our CI 🦺

Patch

  • Make: Removed rand/free/malloc stubs when avoiding libc (0148282)
  • Make: Deploy stringzilla_bare for windows (e4ddce8)
  • Make: Added .lib file to uploaded windows archives (2dc6936)
  • Make: Add MSVC bare builds back (5cc5f01)
  • Make: Added stringzilla_bare checks (bbc5cca)
  • Fix: Avoid unused POSIX extensions on macOS (aeb06a5)
  • Make: Deprecate old cross-compilation scripts (2f34c2d)
  • Improve: Drop -pedantic for POSIX extensions (e99d557)
  • Make: Pre-define CMake properties, like -lpthread and pointer size (7722bb1)
  • Improve: serialize_capability for Ice Lake on Clang (58f8cf9)
  • Make: Skip compiler checks for cross-compilation (60988f3)
  • Fix: Unused capabilities in Arm macOS builds (511a09e)
  • Docs: Listing ./scripts and StringWars (5af84dd)
  • Make: Pass -D CMAKE_SYSROOT in cross-compiling CI (a26fc73)
  • Fix: Suppress unused alloc warnings (4868d7f)
  • Make: Reduce CMake nesting (dda024d)
  • Make: Propagate cross-compilation settings (5070321)
  • Improve: Detect NEON+SHA+AES via WinAPI (3b175f8)
  • Fix: Probe mrs for avoid SIGILL on older Arm (d2f8e97)
  • Fix: Isolate & skip SHA-256 tests in Go with GCC (0874b13)
  • Fix: Deprecates sz_checksum (97f9ecf)
  • Make: More aggressive inlining (e8f33c1)
  • Make: Uniform hardware specs logging (f826dfc)
  • Improve: Expose Capabilities to GoLang (5f2cc97)
  • Improve: Branchless serial SHA-256 block processor (fe7efe2)
  • Fix: Missing modulo in SHA #254 (5a513b7)
  • Improve: Smaller stack usage in SHA-256 (#253) (a298be0)
  • Fix: No noescape/nocallback for stateful hashes (f8d321f)
  • Fix: Violating u32/u64 aliasing (7e55e5c)
  • Fix: Missing SSE flags for SHA (403b28b)
  • Improve: io.Writer & hash.Hash64 interface for Go (05f89ca)
  • Improve: Expose sz_dispatch_table_init for Go (5ff7ba1)
  • Fix: Missing Goldmont & Ice SHA dispatch (e29bded)
  • Fix: Supporting unaligned SHA-256 states (c770e48)
  • Fix: Missing C.sz_checksum (652735d)
  • Fix: Hex formatting in Swift on Linux (fc65328)
  • Improve: SHA for Go, JS, Swift (a165322)

v4.2: Faster Hashing and SHA-256 🫆

07 Oct 19:29

Choose a tag to compare

User-facing updates:

  • 🆕 SHA-256 checksums
  • 🆕 Detect compilation settings

Implementation details:

  • 🆕 Intel Goldmont capabilities level
  • 🆕 Arm NEON+SHA capabilities level
  • Hardened Rust builds & capability masking
  • Faster buffer filling in sz_hash in NEON backend
  • Fixed tail handling in sz_copy in SVE backend

Minor

  • Add: Check comp-time capabilities (3347be4)
  • Add: sz_cap_goldmont_k capability! (f70e927)
  • Add: neon+sha new capability! (fcb68a4)
  • Add: Sha256 to bench_token (bb077da)
  • Add: hmac_sha256 APIs (bf1971e)
  • Add: Sha256 class for Python (6ae7b75)
  • Add: Initial Sha256 variant for NEON (bd35030)
  • Add: SHA256 for Arm (20672dd)

Patch

  • Fix: Avoid unaligned SHA loads on ArmV7 (ebf0503)
  • Fix: Sign conversion warning (3c3e5fc)
  • Make: before-all for dnf on Fedora & apt on Debian (fc74452)
  • Make: Consume env-vars for Rust backend builds (222fc39)
  • Improve: Amortize bench_unary costs (d8d19ce)
  • Fix: Init uint32x4_t on MSVC (3dce631)
  • Improve: Bring back SVE2 hash for short inputs (b1c750b)
  • Improve: More sorting tests (61e08ce)
  • Improve: Simplify SVE memory-ops (35e2236)
  • Fix: sz_copy_sve tail issue (2fa818d)
  • Fix: Avoid <arm_neon_sve_bridge.h> (e5b4496)
  • Improve: Different SHA pipeline for AArch64 (c8aafd3)
  • Improve: Try better SHA pipelining (313f71f)
  • Improve: Faster 2-block SHA256 on NEON (9425341)
  • Improve: Deprecate SVE2 hashing (9cb1588)
  • Improve: Try using non-temporal SVE loads (4572a63)
  • Fix: svlasta_u64(svpfalse_b()) UB (4833e83)
  • Improve: Westmere-like hash updates in NEON (064355f)
  • Improve: Hardening Rust builds (d6a9ba6)
  • Fix: Type-casting on Arm (03a0340)

v4.1: Intel Westmere Kernels

02 Oct 18:53

Choose a tag to compare

Thanks to @Algunenano and the broader ClickHouse team for help, back-porting StringZilla kernels to older CPUs 🤗
With this release:

  • Substring search and hashing on CPUs from Westmere to Haswell will become at least 2x faster.
  • Inferring Skylake capabilities in dynamic dispatch won't require VAES extensions only needed for Ice Lake and newer.
  • MSVC will correctly detect Haswell, Ice Lake, and NEON capabilities for compile-time dispatch, lacking options to differentiate other platforms from macros.

Minor

Patch

  • Fix: Checking for SSE and AES in MSVC (f7d95ad)
  • Docs: More links & GPU mentions (04d6ae3)
  • Improve: Reuse AES-NI since Westmere (a3b7cd5)
  • Fix: Replace Nehalem with Westmere (#249) (fe30683)
  • Fix: VAES debuted in Ice Lake (8cef111)

Release v4.0.15

30 Sep 19:24

Choose a tag to compare

Release: v4.0.15 [skip ci]

Patch

  • Improve: Faster unaligned loads in fingerprints (30ad812)
  • Fix: Avoid Ice Lake instructions on older CPUs (96ce576)
  • Improve: Faster streaming hashes on x86 (2ce62ba)
  • Docs: Levenshtein wave shape (8e1f70c)

Release v4.0.14

22 Sep 11:36

Choose a tag to compare

Release: v4.0.14 [skip ci]

Patch

Release v4.0.13

19 Sep 14:13

Choose a tag to compare

Release: v4.0.13 [skip ci]

Patch

  • Make: Pull CUDA within CIBW_BEFORE_ALL (6316330)

v4.0.12: Zero-Copy for Rust and Python

19 Sep 10:30

Choose a tag to compare

This release fixes a critical bug where non-owning Strs slices incorrectly copied entire parent data during GPU memory allocation, instead of just the slice portion. The fix ensures proper Apache Arrow-compatible StringTape format handling with correct offset normalization for zero-copy operations. GPU memory management is now significantly more efficient, eliminating unnecessary re-allocations when data already resides in GPU memory through intelligent parent chain traversal.

A new stringzillas.to_device() function enables explicit GPU memory pre-allocation, useful for testing and performance optimization:

import stringzilla as sz
import stringzillas as szs

# Create strings and slices
strs = sz.Strs(["hello", "world", "test", "data"])
slice_view = strs[1:3] # Non-owning view of ["world", "test"]

# Pre-allocate on GPU (if available)
gpu_strs = szs.to_device(strs)
gpu_slice = szs.to_device(slice_view) # Correctly handles slice offsets

Cross-platform builds are now more stable with fixes for Windows ARM64 cross-compilation, ensuring mutually exclusive architecture flags prevent header conflicts. The CI/CD pipeline correctly generates stringzillas-cuda packages by properly propagating environment variables through cibuildwheel. Enhanced test coverage includes complex Unicode scenarios with RTL text, emoji sequences, and different normalization forms. Documentation has been extended with Rust examples showcasing zero-copy compute_into APIs using StringTape format.

Patch

  • Make: Mutually exclusive platform flags (773d959)
  • Fix: Skip to_device tests w/out GPUs found (d701592)
  • Make: Propagate SZ_TARGET into CIBW env (b222e82)
  • Improve: Avoid realloc for on-GPU views (77e67cf)
  • Docs: Zero-copy Rust compute_into API with StringTape (f4ad81e)
  • Improve: Validate to_device(Strs) for unicode (f3c5357)
  • Improve: Pre-send to GPU with to_device (c78cd21)
  • Fix: Same Strs slicing as in StringTape (b4f8d12)

Release v4.0.11

16 Sep 21:51

Choose a tag to compare

Release: v4.0.11 [skip ci]

Patch

  • Improve: Striping APIs for Python (75321c5)
  • Docs: Coloring StringZilla green! (c3ccfa6)