Skip to content

Commit e23d0e6

Browse files
committed
section in integration notes on security of hash based data structures
1 parent e4ee193 commit e23d0e6

File tree

1 file changed

+47
-0
lines changed

1 file changed

+47
-0
lines changed

doc/final-report/integration-notes.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,3 +140,50 @@ It is known to us that the `ouroboros-consensus` stack has not been updated to
140140
https://github.com/IntersectMBO/ouroboros-network/pull/4951. We would advise to
141141
fix this Nix-related bug rather than downgrading `lsm-tree`’s dependency on
142142
`io-classes` to version 1.5.
143+
144+
# Security of hash based data structures
145+
146+
Data structures based on hashing have to be considered carefully when they may
147+
be used with untrusted data. If the attacker can control the keys in a hash
148+
table for example, they may be able to arrange for all their keys to have hash
149+
collisions which may cause unexpected performance problems. This is why the
150+
Haskell Cardano node implementation does not use hash tables, and uses
151+
ordering-based containers instead (such as `Data.Map`).
152+
153+
The Bloom filters in an LSM tree are hash based data structures. For performance
154+
they do not use cryptographic hashes. So in principle it would be possibile for
155+
an attacker to arrange that all their keys hash to a common set of bits. This
156+
would be a potential problem for the UTxO and other stake related tables in
157+
Cardano, since it is the users that get to pick (with old modest grinding
158+
difficulty) their UTxO keys (TxIn) and stake keys (verification key hashes). It
159+
would be even more serious if an attacker can grind their set of malicious keys
160+
locally, in the knowledge that the same set of keys will hash the same way on
161+
all other Cardano nodes.
162+
163+
This issue was not considered in the original project specification, but we
164+
have considered it and included a mitigation. The mitigation is that on the
165+
initial creation of a lsm-tree session, a random salt is conjured (from
166+
`/dev/random`) and stored persistenly as part of the session. This salt is then
167+
used as part of the Bloom filter hashing for all runs in all tables in the
168+
session.
169+
170+
The result is that while it is in principle still possible to produce hash
171+
collisions in the Bloom filter, this now depends on knowing the salt. And now
172+
every node has a different salt. So a system wide attack becomes impossible;
173+
instead it is only plausible to target individual nodes. Discovering a node's
174+
salt would also be impractically difficult. In principle there is a timing
175+
side channel, in that collisions will cause more I/O and thus take longer.
176+
An attacker would need to get upstream of a victim node, supply a valid block
177+
and measure the timing of receiving the block downstream. There is however a
178+
large amount of noise.
179+
180+
Overall, our judgement is that this mitigation is practically sufficient, but
181+
it merits a securit review from others who may make a different judgement. It
182+
is also worth noting that this issue may occur in other LSM-trees used in other
183+
Cardano and non-Cardano implementations. In particular, RocksDB does not appear
184+
to use a salt at all.
185+
186+
Note that a per-run or per-table hash salt would incur non-trivial costs,
187+
because it would reduce the sharing available in bulk Bloom filter lookups
188+
(looking up N keys in M filters). The Bloom filter lookup is a performance
189+
sensitive part of the overall database implementation.

0 commit comments

Comments
 (0)