Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bincode::serialize generates much bigger results on String types #89

Open
ncloudioj opened this issue Nov 23, 2018 · 2 comments
Open

Bincode::serialize generates much bigger results on String types #89

ncloudioj opened this issue Nov 23, 2018 · 2 comments

Comments

@ncloudioj
Copy link
Member

ncloudioj commented Nov 23, 2018

Noticed this when I was investigating this TODO item. The current serialization mechanism (serialize a two-element tuple i.e. (type, value)) seems to introduced a significant amount of overheads on the String type Values.

Here is some examples:

serialize(&(1u8, true)).len() -> 2 // actual size: 2
serialize(&(2u8, 1e+9).len() -> 9 // actual size: 9 (1 + 8)
serialize(&(3u8, "hello world".to_string())).len() -> 20 // actual size: 12 (1 + 11)
serialize(&(4u8, "4dd69e99-07e7-c040-a514-ccde0cfd4781".to_string())).len() -> 45 // actual: 37 (1 + 36)

Unsure if it was caused by the padding, or by the serializations. But I think it's worth a further investigation.

Alternatively, we can just write the Type and Value directly to a buffer, then pass the result to put function. For big Values, we can avoid the double allocation by leveraging the "MDB_RESERVE" feature, which basically reserves enough space for the value, and return the buffer so that the user can populate the buffer afterwards. The following snippets illustrate the basic idea,

fn put(&self, key, value) {
    // say BIG_VALUE_THRESHOLD = 32
    let length = ::std::mem::size_of_value(&value) + 1;  // value size + type size

    if length < BIG_VALUE_THRESHOLD {
        let buf = [u8, BIG_VALIE_THRESHOLD];
        buf.write_u8(&type);
        buf.write_all(&value);
        self.txn.put(&k, &buf[..length]);
    } else {
        let mut reserved_buf = self.txn.reserve(&k, length);
        reserved_buf.write_u8(&type);
        reserved_buf.write_all(&value);
    }
}
@badboy
Copy link
Member

badboy commented Apr 25, 2019

Strings have to serialize their length. The length is stored as a 64-bit integer, therefore the additional 8 bytes.
This simplifies deserialization.

Also see #109 which would mean the user is responsible for any serialization/deserialization.

@ncloudioj
Copy link
Member Author

Strings have to serialize their length. The length is stored as a 64-bit integer, therefore the additional 8 bytes.

👍

Also see #109 which would mean the user is responsible for any serialization/deserialization.

Agreed, this overhead could be undesired if the consumer only wants to store some binary blobs. Even for string values, particularly short ones, that 8 bytes overhead can make the map size estimation trickier for the consumers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants