Skip to content

Commit 94d6e27

Browse files
Datseriskahaaga
andauthored
Encoding for binnings (#177)
* only convert to probabilities at the very last step * put the `vec` in there * finish `encode/decode` and linear indexing * rework valuehistogram to store the encoding directly * remove special clauses for input timeseries * make histogram size part of the bin encoding * massive simplification of all binning source * WIP fixing tests Also ported explicitly to StateSpaceSets * use next float also in the estimation of histogram size * use encding -1 for points outside histogram * declare encoding api * fix all tests! * Use StateSpaceSets for tests, and explicit imports * delete weird unknown "DS_Store" files * bring back lost value histogram test file * update api to correct order for probs * all tests are passing Co-authored-by: Kristian Haaga <[email protected]>
1 parent 3c705f5 commit 94d6e27

16 files changed

+262
-281
lines changed

.DS_Store

-6 KB
Binary file not shown.

Project.toml

+3-1
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,21 @@ Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
1616
Scratch = "6c6a2e73-6563-6170-7368-637461726353"
1717
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
1818
SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
19+
StateSpaceSets = "40b095a5-5852-4c12-98c7-d43bf788e795"
1920
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
2021
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
2122
Wavelets = "29a6e085-ba6d-5f35-a997-948ac2efa89a"
2223

2324
[compat]
2425
Combinatorics = "1"
25-
DelayEmbeddings = "2.5"
26+
DelayEmbeddings = "2.6"
2627
Distances = "0.9, 0.10"
2728
FFTW = "1"
2829
Neighborhood = "0.2.4"
2930
QuadGK = "2"
3031
Scratch = "1"
3132
SpecialFunctions = "0.10, 1.0, 2"
33+
StateSpaceSets = "0.1.2, 1"
3234
StaticArrays = "0.12, 1.0"
3335
Wavelets = "0.9"
3436
julia = "1.5"

docs/.DS_Store

-6 KB
Binary file not shown.

docs/src/devdocs.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,11 @@ Good practices in developing a code base apply in every Pull Request. The [Good
77
### Mandatory steps
88
1. Decide on the outcome space and how the estimator will map probabilities to outcomes.
99
2. Define your type and make it subtype `ProbabilitiesEstimator`.
10-
3. Add a docstring to your type following the style of the docstrings of other estimators.
11-
4. Implement dispatch for [`probabilities_and_outcomes`](@ref).
12-
5. Implement dispatch for [`outcome_space`](@ref).
13-
6. Add your type to the list in the docstring of [`ProbabilitiyEstimator`](@ref).
10+
4. Add a docstring to your type following the style of the docstrings of other estimators.
11+
5. If suitable, the estimator may be able to operate based on [`Encoding`]s. If so, it is preferred to implement an `Encoding` subtype and extend the methods [`encode`](@ref) and [`decode`](@ref). This will allow your probabilities estimator to be used with a larger span of entropy and complexity methods without additional effort.
12+
6. Implement dispatch for [`probabilities_and_outcomes`](@ref) and your probabilities estimator type.
13+
7. Implement dispatch for [`outcome_space`](@ref) and your probabilities estimator type.
14+
8. Add your probabilities estimator type to the list in the docstring of [`ProbabilitiyEstimator`](@ref), and if you also made an encoding, add it to the [`Encoding`](@ref) docstring.
1415

1516
### Optional steps
1617
You may extend any of the following functions if there are potential performance benefits in doing so:

src/Entropies.jl

+8-6
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,21 @@ To install it, run `import Pkg; Pkg.add("Entropies")`.
99
"""
1010
module Entropies
1111

12-
using DelayEmbeddings
13-
using DelayEmbeddings: AbstractDataset, Dataset, dimension
14-
export AbstractDataset, Dataset
15-
export DelayEmbeddings
12+
using StateSpaceSets
13+
export Dataset, SVector
14+
using DelayEmbeddings: embed, genembed
15+
1616
const Array_or_Dataset = Union{<:AbstractArray{<:Real}, <:AbstractDataset}
1717
const Vector_or_Dataset = Union{<:AbstractVector{<:Real}, <:AbstractDataset}
1818

19+
# Core API types and functions
1920
include("probabilities.jl")
2021
include("entropy.jl")
21-
include("encoding/outcomes.jl")
22+
include("encodings.jl")
23+
# Library implementations (files include other files)
2224
include("probabilities_estimators/probabilities_estimators.jl")
2325
include("entropies/entropies.jl")
24-
26+
include("encoding/all_encodings.jl")
2527
include("deprecations.jl")
2628

2729

src/encoding/all_encodings.jl

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
include("utils.jl")
2+
include("gaussian_cdf.jl")
3+
include("ordinal_pattern.jl")

src/encoding/outcomes.jl

-42
This file was deleted.

src/encodings.jl

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
export Encoding, encode, decode
2+
3+
"""
4+
Encoding
5+
6+
The supertype for all encoding schemes. Encodings **always encode elements of
7+
input data into the positive integers**. The encoding API is defined by the
8+
functions [`encode`](@ref) and [`decode`](@ref).
9+
Some probability estimators utilize encodings internally.
10+
11+
Current available encodings are:
12+
13+
- [`OrdinalPatternEncoding`](@ref).
14+
- [`GaussianCDFEncoding`](@ref).
15+
- [`RectangularBinEncoding`](@ref).
16+
"""
17+
abstract type Encoding end
18+
19+
"""
20+
encode(χ, e::Encoding) -> i::Int
21+
Encoding an element `χ ∈ x` of input data `x` (those given to [`probabilities`](@ref))
22+
using encoding `e`.
23+
"""
24+
function encode end
25+
26+
"""
27+
decode(i::Int, e::Encoding) -> ω
28+
Decode an encoded element `i::Int` into the outcome it corresponds to `ω ∈ Ω`.
29+
`Ω` is the [`outcome_space`](@ref) of a probabilities estimator that uses encoding `e`.
30+
"""
31+
function decode end

src/probabilities.jl

+2-1
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Base.IteratorSize(::Probabilities) = Base.HasLength()
4242
@inline Base.sum(::Probabilities{T}) where T = one(T)
4343

4444
"""
45+
ProbabilitiesEstimator
4546
The supertype for all probabilities estimators.
4647
4748
In Entropies.jl, probability distributions are estimated from data by defining a set of
@@ -203,5 +204,5 @@ Equivalent with `probabilities_and_outcomes(x, est)[2]`, but for some estimators
203204
it may be explicitly extended for better performance.
204205
"""
205206
function outcomes(x, est::ProbabilitiesEstimator)
206-
return probabilities_and_outcomes(x, est)[2]
207+
return probabilities_and_outcomes(est, x)[2]
207208
end

0 commit comments

Comments
 (0)