-
Notifications
You must be signed in to change notification settings - Fork 9
Frame of Reference
The Frame of Reference compression scheme exploits the dense concentration of data within sets much smaller than the domain. If all values in the input are in the range r...r+ω-1
for some ω much smaller than the domain size (ω << |D|
), then we can represent the data using r
as a scalar, and enough bits (or bytes) to represent a value in the range 0..ω-1 . This is done for consecutive segments of the data, so instead of one reference value r
we switch r
's every s elements - and can thus exploit this concentration even when it's only local rather than global (so gradual changes which add up to much more than ω can still be represented).
The above definition of the scheme can be easily generalized to replace the use of a const 'r' added to all elements to a richer model function. For example, if the data is an affine function and a bit of noise, i.e. data[i] = a*i + b + noise[i]
where the noise is always between -N/2
and N/2-1
, we could set aside the fixed values a and b-N/2
, and have the per-element representation compressed[i] = noise[i] - N/2
- essentially setting ω=N
. When decompressing, instead of adding a constant value we would add a*i + b - N/2
. The model functions may be selected from some linear space, of finite dimension m, and be represented by coefficients w.r.t. a basis for the space. An obvious example is the space of polynomials up to some degree m-1; this example is actually implemented in the library and may be instantiated as necessary. This is also done per segment, so in actuality the FOR compression schemes can be said to use "piecewise functions" as a model for the data.
This scheme's shorthand designator is FOR
.
- The commonly-known FOR involves just a single constant as the reference value (= constant-value reference model)
- Per-segment polynomials are a simple but flexible model, which does not require much complex computation for evaluation; of course, any set of basis functions can be used (e.g. sinusoidal harmonies).
- B-splines can be used as models, and while their individual non-zero domain is not contained within a single input segment, they often model data very well in real life while exhibiting useful featores. B- is not supported.
Can element offsets be negative?
- We can theoretically require the model to always under-predict the data, so that offsets are always non-negative; in this case,
τ_c
can represent twice as many positive offsets. one can always reduce the model by the constant value being the maxium-absolute - Alternatively, we could allow for signed offsets, making the modeling procedure more natural - but having to make sure and avoid large positive offsets.
Note that for linearly-ordered discrete data (specifically, integers), a model function space including constant functions, and data requiring offsets as low as -ω_0
- we could just add -ω_0
to the model function for the segment, making all the offsets positive and switching to an unsigned type for them.
Parameter | Aliases | Symbol used |
---|---|---|
Domain | D | |
Domain element type | Uncompressed type, uncompressed data type | τ_D |
Offset type | Compressed type | τ_c |
Column length | data length | n |
Maximum offset | ω | |
Model space dimension | Model function space dimension, number of model coefficients | m |
Segment length | Baselining period | s |
- An array of length
ceil(n/s)
holds the model coefficients. These might be a single value of type τ_D - for traditional constant-reference FOR, or a sequence ofm
coefficients of a function with the linear space associated with the specific instantiation of FOR. The coefficients may be floating-point even if the uncompressed and/or compressed type are discrete. - The offsets from the model functions are held in an array of length
n
, and element typeτ_c
(which, as mentioned above, may be either signed or unsigned).