@@ -140,3 +140,50 @@ It is known to us that the `ouroboros-consensus` stack has not been updated to
140
140
https://github.com/IntersectMBO/ouroboros-network/pull/4951 . We would advise to
141
141
fix this Nix-related bug rather than downgrading ` lsm-tree ` ’s dependency on
142
142
` io-classes ` to version 1.5.
143
+
144
+ # Security of hash based data structures
145
+
146
+ Data structures based on hashing have to be considered carefully when they may
147
+ be used with untrusted data. If the attacker can control the keys in a hash
148
+ table for example, they may be able to arrange for all their keys to have hash
149
+ collisions which may cause unexpected performance problems. This is why the
150
+ Haskell Cardano node implementation does not use hash tables, and uses
151
+ ordering-based containers instead (such as ` Data.Map ` ).
152
+
153
+ The Bloom filters in an LSM tree are hash based data structures. For performance
154
+ they do not use cryptographic hashes. So in principle it would be possibile for
155
+ an attacker to arrange that all their keys hash to a common set of bits. This
156
+ would be a potential problem for the UTxO and other stake related tables in
157
+ Cardano, since it is the users that get to pick (with old modest grinding
158
+ difficulty) their UTxO keys (TxIn) and stake keys (verification key hashes). It
159
+ would be even more serious if an attacker can grind their set of malicious keys
160
+ locally, in the knowledge that the same set of keys will hash the same way on
161
+ all other Cardano nodes.
162
+
163
+ This issue was not considered in the original project specification, but we
164
+ have considered it and included a mitigation. The mitigation is that on the
165
+ initial creation of a lsm-tree session, a random salt is conjured (from
166
+ ` /dev/random ` ) and stored persistenly as part of the session. This salt is then
167
+ used as part of the Bloom filter hashing for all runs in all tables in the
168
+ session.
169
+
170
+ The result is that while it is in principle still possible to produce hash
171
+ collisions in the Bloom filter, this now depends on knowing the salt. And now
172
+ every node has a different salt. So a system wide attack becomes impossible;
173
+ instead it is only plausible to target individual nodes. Discovering a node's
174
+ salt would also be impractically difficult. In principle there is a timing
175
+ side channel, in that collisions will cause more I/O and thus take longer.
176
+ An attacker would need to get upstream of a victim node, supply a valid block
177
+ and measure the timing of receiving the block downstream. There is however a
178
+ large amount of noise.
179
+
180
+ Overall, our judgement is that this mitigation is practically sufficient, but
181
+ it merits a securit review from others who may make a different judgement. It
182
+ is also worth noting that this issue may occur in other LSM-trees used in other
183
+ Cardano and non-Cardano implementations. In particular, RocksDB does not appear
184
+ to use a salt at all.
185
+
186
+ Note that a per-run or per-table hash salt would incur non-trivial costs,
187
+ because it would reduce the sharing available in bulk Bloom filter lookups
188
+ (looking up N keys in M filters). The Bloom filter lookup is a performance
189
+ sensitive part of the overall database implementation.
0 commit comments