This project is an implementation of a disk-backed key-value store, specified as a "challenge problem" in a blog post written by John De Goes at Precog. It's broken down into three chunks:
com.mergeconflict.bplus
: a generic implementation of immutable B+-trees.com.mergeconflict.precog
: the original API specified in the blog post.com.mergeconflict.precog.impl
:bplus
applied to byte arrays with a disk storage policy.
Id
: a unique identifier for a node, a combination of the upper bound on keys contained or referenced by that node, and the distance of the node from the tree's leaves.Identity
: a type class capturing object identity, which I believe is needed because Array is horrible.MutableProvider
: a "policy" trait capturing how nodes are stored and the branching factor of the B+-tree.MutableTree
: a mutable wrapper forTree
, mostly satisfying theDiskStore
interface from precog (with the exception oftraverse
, since that method expects aReader
, which expects byte arrays).Node
: the meat of the B+-tree algorithm. ADataNode
contains key-value pairs and anId
referencing the nextDataNode
in the tree; aTreeNode
containsId
s referencing its children.Put
: the result of aput
operation, informally a list of which nodes were made "dirty" and should be subsequentlyflush
ed to storage.Tree
: a simple wrapper for a root node, the current list of "dirty" nodes and the tree's provider.
DiskStore
: the interface for a key-value store.DiskStoreSource
: a factory forDiskStore
s.Reader
: an iteratee over key-value byte array pairs.
Stream
: a type class to abstract operations on JavaInputStream
andOutputStream
(vaguely similar to Haskell'sBinary
type class).TreeDiskStore
: aMutableTree
specialized to keys and values of typeArray[Byte]
, implementingtraverse
.TreeDiskStoreProvider
: an implementation ofMutableProvider
for disk-backedTreeDiskStore
s.
- I hate Arrays: reference identity is for jerks.
- I hate
for
-comprehensions (compared to Haskell'sdo
sugar). I hate them so much that I threw out all my code that usedscalaz.Validation
andscalaz.ReaderWriterStateT
. On that note, I also hate "type lambdas" and I think I probably hate scalaz. - I need to tweak the tests to use much larger data sets.
- I haven't implemented Precog's consistency requirement ("If the program is
forcibly terminated at any point during writing to a disk store, then
retrieving the disk store may not fail and must preserve all information
prior to the most recent call to
flush
"). Plan is to just implement it by copying files around. - I haven't properly tested performance.