Simple Bloom filter implemented in RISC-V assembly for RV32I and RV64I instruction sets. Makes optional use of the Zbb instruction set if available.
A Bloom Filter is a highly space-efficient data structure similar to a set. Unlike a set, a Bloom returns "maybe in set", and "definitely not in set" whereas a set returns "definitely in set" and "definitely not in set". The advantage of a Bloom filter is a big gain in space efficiency for very large sets.
The particular Bloom filter implemented here is designed to allow for storage of 100 items with a 1% false positive rate. It was designed with this Bloom filter calculator. It uses 66 bytes of storage to store the 100 items.
This implementation uses two hash functions, both of which are simple modulus hashes (ie, not very good, but quite fast), and as a result it works best if the entropy of the lower 9 bits of the items being inserted is high. If this is not the case, then I suggest replacing the hash_h1 and hash_h2 functions in bloom.s with something more sophisticated; MurmurHash is a commonly used hashing function for this application.
The API follows the standard RISC-V ABI; it should be easy to make it C-callable as well as callable from assembly.
rvbloom's public API consists of four routines:
Insert a string value into the Bloom filter. Input: a0 contains a pointer to a nul-terminated string.
Insert an integer value into the Bloom filter. Input: a0 contains integer to insert.
Determine if a string is possibly in the bloom filter. Input: a0 contains a pointer to a nul-terminated string. Output: a0 = 1 if string is possibly in filter, else 0.
Determine if an integer value is possibly in the bloom filter. Input: a0 contains integer to check. Output: a0 = 1 if integer is possibly in filter, else 0.
rvbloom depends upon my rvint package of RISC-V integer mathematical routines. You will need to download and build rvint prior to building rvbloom.
rvbloom is built with LLVM. It is possible to use the GNU toolchain by modifying the Makefile.
- Download and build rvint
- edit rvbloom/config.s and set the CPU_BITS and HAS_ZBB equates as appropriate.
- edit Makefile, setting TARGET, ARCH, ABI and RVINT variables as appropriate
- make. This will build the rvbloom.a library, suitable for linking with your application, as well as the bloom-tests.x example executable.
bloom-tests.x runs on systems which support Linux syscalls. Expected output from running on the RISC-V ALE emulator
$ whisper /bloom-tests.x --newlib --setreg sp=0x7FFFFF
C --isa acdfimsu
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 01000000 00000100 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
--------
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00100001 00000000 00000000 00000000 00000000 00000000
00000000 00000000 01000000 00000100 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
--------
maybe in
maybe in
not in
Target program exited with code 0
User stop
Retired 16481 instructions in 0.52s 31633 inst/s
$
- Bloom filter parameters (capacity, error likelyhood) may be changed in config.s
- Hash functions in bloom.s may be updated to provide more entropy.