|
| 1 | +# faststringmap |
| 2 | + |
| 3 | +`faststringmap` is a fast read-only string keyed map for Go (golang). |
| 4 | +For our use case it is approximately 5 times faster than using Go's |
| 5 | +built-in map type with a string key. It also has the following advantages: |
| 6 | + |
| 7 | +* look up strings and byte slices without use of the `unsafe` package |
| 8 | +* minimal impact on GC due to lack of pointers in the data structure |
| 9 | +* data structure can be trivially serialized to disk or network |
| 10 | + |
| 11 | +faststringmap v2 is built using Go generics for Go 1.18 onwards. |
| 12 | + |
| 13 | +`faststringmap` is a variant of a data structure called a |
| 14 | +[Trie](https://en.wikipedia.org/wiki/Trie). |
| 15 | +At each level we use a slice to hold the next possible byte values. |
| 16 | +This slice is of length one plus the difference between the lowest and highest |
| 17 | +possible next bytes of strings in the map. Not all the entries in the slice are |
| 18 | +valid next bytes. `faststringmap` is thus more space efficient for keys using a |
| 19 | +small set of nearby runes, for example those using a lot of digits. |
| 20 | + |
| 21 | +There are two variants provided: |
| 22 | + |
| 23 | +* `Map` is a version using a single slice and indexes which can be directly |
| 24 | + serialized (e.g. to a file). It contains no embedded pointers so has minimal |
| 25 | + impact on GC. |
| 26 | + |
| 27 | +* `MapFaster` has improved performance by using a slice for the `next` fields. |
| 28 | + This avoids a bounds check when looking up the entry for a byte. However, it |
| 29 | + comes at the cost of easy serialization and introduces a lot of pointers which |
| 30 | + will have impact on GC. It is not possible to directly construct the slice version |
| 31 | + in the same way so that the whole store is one block of memory. So this code provides |
| 32 | + a function to create it from `Map`. An alternative construction might create distinct |
| 33 | + slice objects at each level. |
| 34 | + |
| 35 | +## Example |
| 36 | + |
| 37 | +Example usage can be found in the tests and also |
| 38 | +[`fast_string_map_example_test.go`](fast_string_map_example_test.go) |
| 39 | +which shows a populated data structure to aid understanding. |
| 40 | + |
| 41 | +## Motivation |
| 42 | + |
| 43 | +I created `faststringmap` in order to improve the speed of parsing CSV |
| 44 | +where the fields were category codes from survey data. The majority of these |
| 45 | +were numeric (`"1"`, `"2"`, `"3"`...) plus a distinct code for "not applicable". |
| 46 | +I was struck that in the simplest possible cases (e.g. `"1"` ... `"5"`) the map |
| 47 | +should be a single slice lookup. |
| 48 | + |
| 49 | +Our fast CSV parser provides fields as byte slices into the read buffer to |
| 50 | +avoid creating string objects. So I also wanted to facilitate key lookup from a |
| 51 | +`[]byte` rather than a string. This is not possible using a built-in Go map without |
| 52 | +use of the `unsafe` package. |
| 53 | + |
| 54 | +## Benchmarks |
| 55 | + |
| 56 | +Below are example benchmarks from my laptop which are for looking up every element |
| 57 | +in a map of size 1000. So approximate times are 25ns per lookup for the Go native map |
| 58 | +and 5ns per lookup for the ``faststringmap``. |
| 59 | +``` |
| 60 | +cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz |
| 61 | +BenchmarkUint32Store |
| 62 | +BenchmarkUint32Store-8 218463 4959 ns/op |
| 63 | +BenchmarkGoStringToUint32 |
| 64 | +BenchmarkGoStringToUint32-8 49279 24483 ns/op |
| 65 | +``` |
0 commit comments