Skip to content

Commit 7c04b3c

Browse files
committed
flesh out BTree docs
1 parent 79c21d9 commit 7c04b3c

File tree

2 files changed

+48
-0
lines changed

2 files changed

+48
-0
lines changed

src/libcollections/btree/map.rs

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,47 @@ use ringbuf::RingBuf;
2929

3030

3131
/// A map based on a B-Tree.
32+
///
33+
/// B-Trees represent a fundamental compromise between cache-efficiency and actually minimizing
34+
/// the amount of work performed in a search. In theory, a binary search tree (BST) is the optimal
35+
/// choice for a sorted map, as a perfectly balanced BST performs the theoretical minimum amount of
36+
/// comparisons necessary to find an element (log<sub>2</sub>n). However, in practice the way this
37+
/// is done is *very* inefficient for modern computer architectures. In particular, every element
38+
/// is stored in its own individually heap-allocated node. This means that every single insertion
39+
/// triggers a heap-allocation, and every single comparison should be a cache-miss. Since these
40+
/// are both notably expensive things to do in practice, we are forced to at very least reconsider
41+
/// the BST strategy.
42+
///
43+
/// A B-Tree instead makes each node contain B-1 to 2B-1 elements in a contiguous array. By doing
44+
/// this, we reduce the number of allocations by a factor of B, and improve cache effeciency in
45+
/// searches. However, this does mean that searches will have to do *more* comparisons on average.
46+
/// The precise number of comparisons depends on the node search strategy used. For optimal cache
47+
/// effeciency, one could search the nodes linearly. For optimal comparisons, one could search
48+
/// search the node using binary search. As a compromise, one could also perform a linear search
49+
/// that initially only checks every i<sup>th</sup> element for some choice of i.
50+
///
51+
/// Currently, our implementation simply performs naive linear search. This provides excellent
52+
/// performance on *small* nodes of elements which are cheap to compare. However in the future we
53+
/// would like to further explore choosing the optimal search strategy based on the choice of B,
54+
/// and possibly other factors. Using linear search, searching for a random element is expected
55+
/// to take O(Blog<sub>B</sub>n) comparisons, which is generally worse than a BST. In practice,
56+
/// however, performance is excellent. `BTreeMap` is able to readily outperform `TreeMap` under
57+
/// many workloads, and is competetive where it doesn't. BTreeMap also generally *scales* better
58+
/// than TreeMap, making it more appropriate for large datasets.
59+
///
60+
/// However, `TreeMap` may still be more appropriate to use in many contexts. If elements are very
61+
/// large or expensive to compare, `TreeMap` may be more appropriate. It won't allocate any
62+
/// more space than is needed, and will perform the minimal number of comparisons necessary.
63+
/// `TreeMap` also provides much better performance stability guarantees. Generally, very few
64+
/// changes need to be made to update a BST, and two updates are expected to take about the same
65+
/// amount of time on roughly equal sized BSTs. However a B-Tree's performance is much more
66+
/// amortized. If a node is overfull, it must be split into two nodes. If a node is underfull, it
67+
/// may be merged with another. Both of these operations are relatively expensive to perform, and
68+
/// it's possible to force one to occur at every single level of the tree in a single insertion or
69+
/// deletion. In fact, a malicious or otherwise unlucky sequence of insertions and deletions can
70+
/// force this degenerate behaviour to occur on every operation. While the total amount of work
71+
/// done on each operation isn't *catastrophic*, and *is* still bounded by O(Blog<sub>B</sub>n),
72+
/// it is certainly much slower when it does.
3273
#[deriving(Clone)]
3374
pub struct BTreeMap<K, V> {
3475
root: Node<K, V>,
@@ -93,6 +134,8 @@ impl<K: Ord, V> BTreeMap<K, V> {
93134
}
94135

95136
/// Makes a new empty BTreeMap with the given B.
137+
///
138+
/// B cannot be less than 2.
96139
pub fn with_b(b: uint) -> BTreeMap<K, V> {
97140
assert!(b > 1, "B must be greater than 1");
98141
BTreeMap {

src/libcollections/btree/set.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ use core::fmt::Show;
2323
use {Mutable, Set, MutableSet, MutableMap, Map};
2424

2525
/// A set based on a B-Tree.
26+
///
27+
/// See BTreeMap's documentation for a detailed discussion of this collection's performance
28+
/// benefits and drawbacks.
2629
#[deriving(Clone, Hash, PartialEq, Eq, Ord, PartialOrd)]
2730
pub struct BTreeSet<T>{
2831
map: BTreeMap<T, ()>,
@@ -65,6 +68,8 @@ impl<T: Ord> BTreeSet<T> {
6568
}
6669

6770
/// Makes a new BTreeSet with the given B.
71+
///
72+
/// B cannot be less than 2.
6873
pub fn with_b(b: uint) -> BTreeSet<T> {
6974
BTreeSet { map: BTreeMap::with_b(b) }
7075
}

0 commit comments

Comments
 (0)