Discard Zero Bytes Variable

Overview

Suppose you have a unsigned integer column with significant support both amont elements of smaller sizes and larger sizes (e.g. sampled from a distribution in which every element with k non-zero bytes having frequency ~ 1/|τ_D| * 1/2^k ). If only one effective size k had significant support, we could use the Discard Zero Bytes scheme, drop |τ_D|-k bytes, and use patching for all the other sizes. Alternatively, we can use DZB with the largest size with significant support - but that would mean wasting a lot of space on zeros. The third alternative is to combine, or multiplex, if you will, DZB's with different element sizes: For each element there is now an indication of the size; and the non-zero bytes are all concatenated together, so that the compressed data is now of variable length.

The size of an element is represented using b bits, interpreted as an unsigned integral value and added to a mininum size k_0 (so that the possible sizes are k_0..k_0+2^b-1, inclusive). These sequences of b bits are packed into bit container elements, having the native word size of the GPU.

Now, the variable length of the compressed representation of elements makes it necessary to add up all element sizes upto index i to determine the position, in the compressed data, of the i'th element. To avoid this, the scheme also includes anchoring information: The value of this sum for every segment of compressed elements of length s. These anchors are fixed-width, of course, and are used similarly to the case of the Delta compression scheme and others as well.

The scheme's shorthand designation is DZBV, or NSV if you're used used to refer to dropping 0 bytes as "null suppression".

Scheme parameters

Parameter	Aliases	Symbol used
Domain	Uncompressed domain	D
Domain element type	Uncompressed type, uncompressed data type	τ_D = ceil(log(\|D\|)/8)
Column length	Single bitmap length, uncompressed data length	n
Minimum element size	Base element size	k_0
Segment length	Anchoring segment length	s
Size of size container element	native word size	ν = ceil(log(\|D_nw\|)/8)
Element sizes per size container		σ = floor(ν*8/b)

Possible variants

More flexibility of the size range

One could have an arbitrary dictionary for size representation to actual element size, so the range of possible sizes would not be contiguous.

Bit-resolution element sizes

At the moment, the element size value indicates a size in bytes. It could, theoretically, indicate a size in bits, with the variable-length data being a sequence of bits rather than of bytes.

No slack in size containers

At the moment, each element size value is wholly contained in a single size container element, and there's no spill-over when b does not divide ν*. This means that there's some slack in each container element. This slack could be foregone, improving the compression ratio somewhat (e.g. for b=3 and with ν = 4 - saving 2 bits for every 10 elements, i.e. 0.2 bits per element); the cost would be the need to read two rather than one size container element for decompressing a single compressed value.

Memory layout

The following scalars (not known at compile time) are passed to the kernel: n, s, b, k_0
The compressed data is a contiguous array of bytes; each element's lower, non-zero bytes appear immediately after all of the preceding elements'. There is no padding or marking of any kind.
An array of n/σ size-container elements, each a native word of the GPU, containing σ sequences of b bits each, being the sizes of of σ contiguous elements in the uncompressed data
An array of n/s position anchors: positions withint the compressed data array at which the non-zero bytes for uncompressed elements 0, s, 2*s etc. begin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly