Skip to content

Frame of Reference

Eyal Rozenberg edited this page May 4, 2017 · 2 revisions

Overview

The Frame of Reference compression scheme exploits the dense concentration of data within sets much smaller than the domain. If all values in the input are in the range r...r+ω-1 for some ω much smaller than the domain size (ω << |D|), then we can represent the data using r as a scalar, and enough bits (or bytes) to represent a value in the range 0..ω-1 . This is done for consecutive segments of the data, so instead of one reference value r we switch r's every s elements - and can thus exploit this concentration even when it's only local rather than global (so gradual changes which add up to much more than ω can still be represented).

The above definition of the scheme can be easily generalized to replace the use of a const 'r' added to all elements to a richer model function. For example, if the data is an affine function and a bit of noise, i.e. data[i] = a*i + b + noise[i] where the noise is always between -N/2 and N/2-1, we could set aside the fixed values a and b-N/2, and have the per-element representation compressed[i] = noise[i] - N/2 - essentially setting ω=N. When decompressing, instead of adding a constant value we would add a*i + b - N/2. The model functions may be selected from some linear space, of finite dimension m, and be represented by coefficients w.r.t. a basis for the space. An obvious example is the space of polynomials up to some degree m-1; this example is actually implemented in the library and may be instantiated as necessary. This is also done per segment, so in actuality the FOR compression schemes can be said to use "piecewise functions" as a model for the data.

This scheme's shorthand designator is FOR.

Variants

Reference model richness

  • The commonly-known FOR involves just a single constant as the reference value (= constant-value reference model)
  • Per-segment polynomials are a simple but flexible model, which does not require much complex computation for evaluation; of course, any set of basis functions can be used (e.g. sinusoidal harmonies).
  • B-splines can be used as models, and while their individual non-zero domain is not contained within a single input segment, they often model data very well in real life while exhibiting useful featores. B- is not supported.

Offset sign

Can element offsets be negative?

  • We can theoretically require the model to always under-predict the data, so that offsets are always non-negative; in this case, τ_c can represent twice as many positive offsets. one can always reduce the model by the constant value being the maxium-absolute
  • Alternatively, we could allow for signed offsets, making the modeling procedure more natural - but having to make sure and avoid large positive offsets.

Note that for linearly-ordered discrete data (specifically, integers), a model function space including constant functions, and data requiring offsets as low as -ω_0 - we could just add -ω_0 to the model function for the segment, making all the offsets positive and switching to an unsigned type for them.

Scheme parameters

Parameter Aliases Symbol used
Domain D
Domain element type Uncompressed type, uncompressed data type τ_D
Offset type Compressed type τ_c
Column length data length n
Maximum offset ω
Model space dimension Model function space dimension, number of model coefficients m
Segment length Baselining period s

Memory layout

  • An array of length ceil(n/s) holds the model coefficients. These might be a single value of type τ_D - for traditional constant-reference FOR, or a sequence of m coefficients of a function with the linear space associated with the specific instantiation of FOR. The coefficients may be floating-point even if the uncompressed and/or compressed type are discrete.
  • The offsets from the model functions are held in an array of length n, and element type τ_c (which, as mentioned above, may be either signed or unsigned).

Links to the code