Julia Dict and Set data structures safely persisted to disk.
All collections are backed by LMDB - a super fast B-Tree based embedded KV database with ACID guaranties. As with other B-Tree based databases reads are faster than writes. However, write performance is still decent (expect 1k-10k TPS).
Care was taken to make the data structures thread-safe. LMDB handles most of the locking well - we just have to exclusively lock the LMDB.Environment when writing
to prevent multiple threads opening multile write transactions (deadlock will occur).
- Install this package:
import Pkg Pkg.add("https://github.com/blenessy/PersistentCollections.jl.git")
- Create an
LMDB.Environmentin a directory calleddata(in your current working directory):using PersistentCollections env = LMDB.Environment("data")
- Create an
AbstractDictin your LMDB environment:dict = PersistentDict{String,String}(env)
- Use it as any other dict:
dict["foo"] = "bar" @assert dict["foo"] == "bar" @assert collect(keys(dict)) == ["foo"] @assert collect(values(dict)) == ["bar"]
- (Optional) note the asymetric performance characteristic of LMDB (B-Tree) based database:
@time dict["bar"] = "baz"; # Writes to LMDB (B-Tree) are relatively slow @time dict["bar"]; # Reads are very fast though :)
It is possible to create persistent collection of Any type although some methods will not be able to convert the value to the correct type because no metadata is stored for this in DB.
Most notably the getindex method (e.g. dict["foo"]) will not return a converted value. To mitigate this limitation, use the get method, which includes a default value.
The type of the default value (if other than nothing) will be used to convert the value to the desired type.
env = LMDB.Environment("data")
dict = PersistentDict{Any,Any}(env)
dict["foo"] == "bar"
dict["foo"] # PersistentCollections.LMDB.MDBValue{Nothing}(0x0000000000000003, Ptr{Nothing} @0x000000012c806ffd, nothing)
get(dict, "foo", "") # "bar"
convert(String, dict["foo"]) # "bar"It is possible if you need transactional consistency between multiple persistent collections:
- Create your
LMDB.Environmentwith "named database" support by specifying the number of persistent collections yoy want with themaxdbskeyword argument:env = LMDB.Environment("data", maxdbs=2)
- Instantiate your persistent collections with a unique (within LMDB env.) id:
dict1 = PersistentDict{String,String}(env, id="mydict1") dict2 = PersistentDict{String,Int}(env, id="mydict2")
Yes, you can expect significant increase with write throughput if you are willing to risk loosing your last written transactions. Please note that database integrity (risk of curruption) is not in danger here.
unsafe_env = LMDB.Environment("data", flags=LMDB.MDB_NOSYNC)
unsafe_dict = PersistentDict{String,String}(unsafe_env)
flush(unsafe_env) do
unsafe_dict["foo"] = "bar"
unsafe_dict["foo"] = "baz"
end # <== data is flushed to disk hereThis is equvalent to:
unsafe_env = LMDB.Environment("data", flags=LMDB.MDB_NOSYNC)
unsafe_dict = PersistentDict{String,String}(unsafe_env)
try
unsafe_dict["foo"] = "bar"
unsafe_dict["foo"] = "baz"
finally
flush(unsafe_env)
endmake testmake coveragemake bench- Travis CI integration
- Coveralls integration (when public)
- All platforms supported
- Part of Julia Registry
- Optimised implementation
- Thread Safe
- MDB_NOSYNC support
- Named database support
- Manual flush (sync) to disk
- Implemented
- Thread Safe
- MDB_NOSYNC support
- Named database support
- Manual flush (sync) to disk
Lots of LMDB wrapping magic was pinched from wildart/LMDB.jl - who deserves lots of credits.