WIP: docs

jorisdral · jorisdral · commit b57fac0966e3 · 2025-06-30T16:11:16.000+02:00
diff --git a/bloomfilter-blocked/README.md b/bloomfilter-blocked/README.md
@@ -62,11 +62,13 @@ differences are:
   over a `Hashable` type class, instead of having a `a -> [Hash]` typed field.
   This separation in `bloomfilter-blocked` allows clean (de-)serialisation of
   filters as the hashing scheme is static.
-* `bloomfilter-blocked` uses `XXH3` for hashing instead of the Jenkins'
-  `lookup3` that `bloomfilter` uses.
-
+* `bloomfilter-blocked` uses [`XXH3`][xxh3] for hashing instead of [Jenkins'
+  `lookup3`][lookup3:wiki], which `bloomfilter` uses.
+* TODO: salt
 
 <!-- Sources -->
 
 [bloom-filter:wiki]: https://en.wikipedia.org/wiki/Bloom_filter
-[bloomfilter:hackage]: https://hackage.haskell.org/package/bloomfilter
+[bloomfilter:hackage]: https://hackage.haskell.org/package/bloomfilter
+[xxh3]: https://xxhash.com/
+[lookup3:wiki]: https://en.wikipedia.org/wiki/Jenkins_hash_function#lookup3
diff --git a/bloomfilter-blocked/src/Data/BloomFilter.hs b/bloomfilter-blocked/src/Data/BloomFilter.hs
@@ -3,6 +3,61 @@
 -- implementation, import "Data.BloomFilter.Blocked".
 module Data.BloomFilter (
     module Data.BloomFilter.Classic
+    -- * Example: a spelling checker
+    -- $example
+
+    -- * Differences with the @bloomfilter@ package
+    -- $differences
   ) where
 
 import           Data.BloomFilter.Classic
+
+--  $example
+--
+-- This example reads a dictionary file containing one word per line,
+-- constructs a Bloom filter with a 1% false positive rate, and
+-- spellchecks its standard input.  Like the Unix @spell@ command, it
+-- prints each word that it does not recognize.
+--
+-- >>> import           Control.Monad (forM_)
+-- >>> import           System.Environment (getArgs)
+-- >>> import qualified Data.BloomFilter as B
+--
+-- >>> :{
+-- main :: IO ()
+-- main = do
+--     files <- getArgs
+--     dictionary <- readFile "/usr/share/dict/words"
+--     let !bloom = B.fromList (B.policyForFPR 0.01) 4 (words dictionary)
+--     forM_ files $ \file ->
+--           putStrLn . unlines . filter (`B.notElem` bloom) . words
+--       =<< readFile file
+-- :}
+
+-- $differences
+--
+-- This package is an entirely rewritten fork of the
+-- [bloomfilter](https://hackage.haskell.org/package/bloomfilter) package.
+--
+-- The main differences are
+--
+-- * Support for both classic and \"blocked\" Bloom filters. Blocked-structured
+--   Bloom filters arrange all the bits for each insert or lookup into a single
+--   cache line, which greatly reduces the number of slow uncached memory reads.
+--   The trade-off for this performance optimisation is a slightly worse
+--   trade-off between bits per element and the FPR. In practice for typical
+--   FPRs of @1-e3@ up to @1e-4@, this requires a couple extra bits per element.
+--
+-- * This package support Bloom filters of arbitrary sizes (not limited to powers
+--   of two).
+--
+-- * Sizes over @2^32@ are supported up to @2^48@ for classic Bloom filters and
+--   @2^41@ for blocked Bloom filters.
+--
+-- * The 'Bloom' and 'MBloom' types are parametrised over a 'Hashable' type
+--   class, instead of having a @a -> ['Hash']@ typed field.
+--   This separation allows clean (de-)serialisation of Bloom filters in this
+--   package, as the hashing scheme is static.
+--
+-- * [@XXH3@ hash](https://xxhash.com/) is used instead of [Jenkins'
+--   @lookup3@](https://en.wikipedia.org/wiki/Jenkins_hash_function#lookup3).