Releases: ashvardanian/StringZilla
Release v4.2.3
Release: v4.2.3 [skip ci]
Patch
- Fix: Missing bounds checks in Rust (#273) (5219a4d)
- Fix: Type-casting UBs of
movemaskbitsets (7c42b98) - Fix: Handling a larger
orderarray (32b6350) - Fix:
head_lengthis pre-decremented to zero (1c5c7e8) - Fix: Avoid
std::enable_iffor non-STL builds (568d90c) - Fix: Lifetime of temp strings in ranges (73ce811)
Release v4.2.2
Release: v4.2.2 [skip ci]
Patch
- Improve: LUTs in SVE (3d886d3)
- Make: Linux cross-compile matching Release CI (524b0d7)
- Fix: Check for Arm Neon support on windows (30320b7)
- Make: Removed pyarrow from windows arms python tests (eab8c3c)
- Make: Exclude KERNEL32.dll from stringzilla_bare checks (9edb804)
- Make: Disabled SVE when using MSVC (04c985b)
- Make: Use correct arch on windows for stringzillas/cuda (3fcd947)
- Make: Updated target arch for windows tests. (e6460e1)
- Fix: Disable windows min/max macros (00e902f)
- Fix: Replace processthreadsapi.h with windows.h (f09e4f9)
- Make: Expand CMAKE_HOST_SYSTEM_PROCESSOR (fe09f8d)
- Make: Revert
--sysrootcross-compile commands (579c82d) - Fix: Accessing
ARM64_CNTVCTon Windows (5e6777d) - Make: Avoid redefining
arch=armv8.2-ain pragmas (636147d) - Make: Expand CMAKE_HOST_SYSTEM_PROCESSOR (1f90f6c)
- Make: Link to
libc++in LLVM builds on MacOS (1c8b29b) - Make: Revert
_M_ARM64=1flags for MSVC (25311a6) - Make: Enable Posix extensions for Python builds (9fe4f7c)
- Make: Missing macros for
winnt.h(169)C1189 error (8ef98a9) - Fix: Reading
mrsw/out inline Asm on MSVC (d804c9f) - Make: Override
--sysrootfor "Cross Compile" builds (d3d901d) - Make: Use valid arch flags on MSVC (5aba122)
- Make: Cross compile checks now correct for MSVC (7664f67)
- Make: Windows arm now uses the correct compiler (7c2e9a0)
- Make: cmake set ARCHIVE_OUTPUT_DIRECTORY to binary dir (f1ec210)
- Make: Use ninja for windows deploy builds (0af43c8)
- Make: Fixed Windows deploy (8ff2ad7)
- Make: Include experimental Arm cross-compilation (4d86312)
v4.2.1: SHA-256 for JS, Swift, Go
Exposing SHA-256 to GoLang was tricky. Clang worked fine. GCC failed. It turned out that GCC was too shy about inlining my code, resulting in excessive stack space usage... Now, JavaScript, Swift, and GoLang bindings all support incremental SHA-256 procedures 🥳. Thanks to @MarekKnapek for reducing the stack memory usage of the serial SHA variant!
Moreover, thanks to @laurenspriem for highlighting the SIGILL when probing ID registers on older Arm CPUs. I've now guarded first mrs probes with signal handlers. Ugly solution, but it may work 😅 I've also improved the capability detection code on Arm-based Windows machines, using the OS-specific <processthreadsapi.h> functionality, so now not only pure NEON, but also NEON+SHA+AES kernels, should be dispatched just fine!
Thanks to @ashbob999, StringZilla is also getting more stable Windows builds and stringzilla_bare coverage in our CI 🦺
Patch
- Make: Removed rand/free/malloc stubs when avoiding libc (0148282)
- Make: Deploy stringzilla_bare for windows (e4ddce8)
- Make: Added .lib file to uploaded windows archives (2dc6936)
- Make: Add MSVC bare builds back (5cc5f01)
- Make: Added stringzilla_bare checks (bbc5cca)
- Fix: Avoid unused POSIX extensions on macOS (aeb06a5)
- Make: Deprecate old cross-compilation scripts (2f34c2d)
- Improve: Drop
-pedanticfor POSIX extensions (e99d557) - Make: Pre-define CMake properties, like
-lpthreadand pointer size (7722bb1) - Improve:
serialize_capabilityfor Ice Lake on Clang (58f8cf9) - Make: Skip compiler checks for cross-compilation (60988f3)
- Fix: Unused
capabilitiesin Arm macOS builds (511a09e) - Docs: Listing
./scriptsand StringWars (5af84dd) - Make: Pass
-D CMAKE_SYSROOTin cross-compiling CI (a26fc73) - Fix: Suppress unused
allocwarnings (4868d7f) - Make: Reduce CMake nesting (dda024d)
- Make: Propagate cross-compilation settings (5070321)
- Improve: Detect NEON+SHA+AES via WinAPI (3b175f8)
- Fix: Probe
mrsfor avoidSIGILLon older Arm (d2f8e97) - Fix: Isolate & skip SHA-256 tests in Go with GCC (0874b13)
- Fix: Deprecates
sz_checksum(97f9ecf) - Make: More aggressive inlining (e8f33c1)
- Make: Uniform hardware specs logging (f826dfc)
- Improve: Expose
Capabilitiesto GoLang (5f2cc97) - Improve: Branchless serial SHA-256 block processor (fe7efe2)
- Fix: Missing modulo in SHA #254 (5a513b7)
- Improve: Smaller stack usage in SHA-256 (#253) (a298be0)
- Fix: No
noescape/nocallbackfor stateful hashes (f8d321f) - Fix: Violating u32/u64 aliasing (7e55e5c)
- Fix: Missing SSE flags for SHA (403b28b)
- Improve:
io.Writer&hash.Hash64interface for Go (05f89ca) - Improve: Expose
sz_dispatch_table_initfor Go (5ff7ba1) - Fix: Missing Goldmont & Ice SHA dispatch (e29bded)
- Fix: Supporting unaligned SHA-256 states (c770e48)
- Fix: Missing
C.sz_checksum(652735d) - Fix: Hex formatting in Swift on Linux (fc65328)
- Improve: SHA for Go, JS, Swift (a165322)
v4.2: Faster Hashing and SHA-256
User-facing updates:
- 🆕 SHA-256 checksums
- 🆕 Detect compilation settings
Implementation details:
- 🆕 Intel Goldmont capabilities level
- 🆕 Arm NEON+SHA capabilities level
- Hardened Rust builds & capability masking
- Faster buffer filling in
sz_hashin NEON backend - Fixed tail handling in
sz_copyin SVE backend
Minor
- Add: Check comp-time capabilities (3347be4)
- Add:
sz_cap_goldmont_kcapability! (f70e927) - Add:
neon+shanew capability! (fcb68a4) - Add: Sha256 to
bench_token(bb077da) - Add:
hmac_sha256APIs (bf1971e) - Add:
Sha256class for Python (6ae7b75) - Add: Initial Sha256 variant for NEON (bd35030)
- Add: SHA256 for Arm (20672dd)
Patch
- Fix: Avoid unaligned SHA loads on ArmV7 (ebf0503)
- Fix: Sign conversion warning (3c3e5fc)
- Make:
before-allfordnfon Fedora &apton Debian (fc74452) - Make: Consume env-vars for Rust backend builds (222fc39)
- Improve: Amortize
bench_unarycosts (d8d19ce) - Fix: Init
uint32x4_ton MSVC (3dce631) - Improve: Bring back SVE2 hash for short inputs (b1c750b)
- Improve: More sorting tests (61e08ce)
- Improve: Simplify SVE memory-ops (35e2236)
- Fix:
sz_copy_svetail issue (2fa818d) - Fix: Avoid
<arm_neon_sve_bridge.h>(e5b4496) - Improve: Different SHA pipeline for AArch64 (c8aafd3)
- Improve: Try better SHA pipelining (313f71f)
- Improve: Faster 2-block SHA256 on NEON (9425341)
- Improve: Deprecate SVE2 hashing (9cb1588)
- Improve: Try using non-temporal SVE loads (4572a63)
- Fix:
svlasta_u64(svpfalse_b())UB (4833e83) - Improve: Westmere-like hash updates in NEON (064355f)
- Improve: Hardening Rust builds (d6a9ba6)
- Fix: Type-casting on Arm (03a0340)
v4.1: Intel Westmere Kernels
Thanks to @Algunenano and the broader ClickHouse team for help, back-porting StringZilla kernels to older CPUs 🤗
With this release:
- Substring search and hashing on CPUs from Westmere to Haswell will become at least 2x faster.
- Inferring Skylake capabilities in dynamic dispatch won't require
VAESextensions only needed for Ice Lake and newer. - MSVC will correctly detect Haswell, Ice Lake, and NEON capabilities for compile-time dispatch, lacking options to differentiate other platforms from macros.
Minor
Patch
Release v4.0.15
Release v4.0.14
Release v4.0.13
v4.0.12: Zero-Copy for Rust and Python
This release fixes a critical bug where non-owning Strs slices incorrectly copied entire parent data during GPU memory allocation, instead of just the slice portion. The fix ensures proper Apache Arrow-compatible StringTape format handling with correct offset normalization for zero-copy operations. GPU memory management is now significantly more efficient, eliminating unnecessary re-allocations when data already resides in GPU memory through intelligent parent chain traversal.
A new stringzillas.to_device() function enables explicit GPU memory pre-allocation, useful for testing and performance optimization:
import stringzilla as sz
import stringzillas as szs
# Create strings and slices
strs = sz.Strs(["hello", "world", "test", "data"])
slice_view = strs[1:3] # Non-owning view of ["world", "test"]
# Pre-allocate on GPU (if available)
gpu_strs = szs.to_device(strs)
gpu_slice = szs.to_device(slice_view) # Correctly handles slice offsetsCross-platform builds are now more stable with fixes for Windows ARM64 cross-compilation, ensuring mutually exclusive architecture flags prevent header conflicts. The CI/CD pipeline correctly generates stringzillas-cuda packages by properly propagating environment variables through cibuildwheel. Enhanced test coverage includes complex Unicode scenarios with RTL text, emoji sequences, and different normalization forms. Documentation has been extended with Rust examples showcasing zero-copy compute_into APIs using StringTape format.
Patch
- Make: Mutually exclusive platform flags (773d959)
- Fix: Skip
to_devicetests w/out GPUs found (d701592) - Make: Propagate
SZ_TARGETintoCIBWenv (b222e82) - Improve: Avoid
reallocfor on-GPU views (77e67cf) - Docs: Zero-copy Rust
compute_intoAPI with StringTape (f4ad81e) - Improve: Validate
to_device(Strs)for unicode (f3c5357) - Improve: Pre-send to GPU with
to_device(c78cd21) - Fix: Same
Strsslicing as in StringTape (b4f8d12)