Skip to content

Commit a6a5d02

Browse files
authored
Finish API (#187)
* Finish new API decleration in probabilities.jl * actually finish better * finish new API for spectral entropy * fix a bunch more tests * finish all minus TE and PE * alter docstring of `entropy` * fix value hist tests * more fixes * change FixedRectangularBinning to really be fixed always * fix valuehistogram tests * wavelet overlap needs input for well defined outcome * fix diversity * fix wavelet again * fix convenience * reverse order in encode/decode * apply encode decode for rect binning
1 parent 8c368a4 commit a6a5d02

20 files changed

+209
-219
lines changed

src/encoding/ordinal_pattern.jl

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
export OrdinalPatternEncoding
2+
#TODO: The docstring here, and probably the source code, needs a full re-write
3+
# based on new `encode` interface.
24

35
"""
46
OrdinalPatternEncoding <: Encoding
57
OrdinalPatternEncoding(m = 3, τ = 1; lt = est.lt)
68
79
A encoding scheme that converts the input time series to ordinal patterns, which are
8-
then encoded to integers using [`encode_motif`](@ref), used with
9-
[`outcomes`](@ref).
10+
then encoded to integers using [`encode`](@ref).
1011
1112
!!! note
1213
`OrdinalPatternEncoding` is intended for symbolizing *time series*. If providing a short vector,

src/encodings.jl

+8-7
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ export Encoding, encode, decode
33
"""
44
Encoding
55
6-
The supertype for all encoding schemes. Encodings **always encode elements of
7-
input data into the positive integers**. The encoding API is defined by the
6+
The supertype for all encoding schemes. Encodings always encode elements of
7+
input data into the positive integers. The encoding API is defined by the
88
functions [`encode`](@ref) and [`decode`](@ref).
99
Some probability estimators utilize encodings internally.
1010
@@ -17,15 +17,16 @@ Current available encodings are:
1717
abstract type Encoding end
1818

1919
"""
20-
encode(χ, e::Encoding) -> i::Int
20+
encode(c::Encoding, χ) -> i::Int
2121
Encoding an element `χ ∈ x` of input data `x` (those given to [`probabilities`](@ref))
22-
using encoding `e`.
22+
using encoding `c`. The special value of `-1` is reserved as a return value for
23+
inappropriate elements `χ` that cannot be encoded according to `c`.
2324
"""
2425
function encode end
2526

2627
"""
27-
decode(i::Int, e::Encoding) -> ω
28-
Decode an encoded element `i::Int` into the outcome it corresponds to `ω ∈ Ω`.
29-
`Ω` is the [`outcome_space`](@ref) of a probabilities estimator that uses encoding `e`.
28+
decode(c::Encoding, i::Int) -> ω
29+
Decode an encoded element `i` into the outcome it corresponds to `ω ∈ Ω`.
30+
`Ω` is the [`outcome_space`](@ref) of a probabilities estimator that uses encoding `c`.
3031
"""
3132
function decode end

src/entropies/convenience_definitions.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ entropy(Shannon(base), est, x)
5858
See [`WaveletOverlap`](@ref) for more info.
5959
"""
6060
function entropy_wavelet(x; wavelet = Wavelets.WT.Daubechies{12}(), base = 2)
61-
est = WaveletOverlap(wavelet)
61+
est = WaveletOverlap(x, wavelet)
6262
entropy(Shannon(; base), est, x)
6363
end
6464

src/entropy.jl

+21-23
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,11 @@ These entropy types are given as inputs to [`entropy`](@ref) and [`entropy_norma
2121
2222
Mathematically speaking, generalized entropies are just nonnegative functions of
2323
probability distributions that verify certain (entropy-type-dependent) axioms.
24-
Amigó et al., 2018's
25-
[summary paper](https://www.mdpi.com/1099-4300/20/11/813) gives a nice overview.
24+
Amigó et al.[^Amigó2018] summary paper gives a nice overview.
2625
2726
[Amigó2018]:
2827
Amigó, J. M., Balogh, S. G., & Hernández, S. (2018). A brief review of
29-
generalized entropies. Entropy, 20(11), 813.
28+
generalized entropies. [Entropy, 20(11), 813.](https://www.mdpi.com/1099-4300/20/11/813)
3029
"""
3130
abstract type Entropy <: AbstractEntropy end
3231

@@ -57,21 +56,23 @@ abstract type EntropyEstimator <: AbstractEntropy end
5756
###########################################################################################
5857
# Notice that StatsBase.jl exports `entropy` and Wavelets.jl exports `Entropy`.
5958
"""
60-
entropy([e::Entropy,] probs::Probabilities) → h::Real ∈ [0, ∞)
61-
entropy([e::Entropy,] est::ProbabilitiesEstimator, x) → h::Real ∈ [0, ∞)
62-
entropy([e::Entropy,] est::EntropyEstimator, x) → h::Real ∈ [0, ∞)
59+
entropy([e::Entropy,] probs::Probabilities)
60+
entropy([e::Entropy,] est::ProbabilitiesEstimator, x)
61+
entropy([e::Entropy,] est::EntropyEstimator, x)
6362
64-
Compute `h`, a (generalized) [`Entropy`](@ref) of type `e`, in one of three ways:
63+
Compute `h::Real`, which is
64+
a (generalized) entropy defined by `e`, in one of three ways:
6565
6666
1. Directly from existing [`Probabilities`](@ref) `probs`.
6767
2. From input data `x`, by first estimating a probability distribution using the provided
68-
[`ProbabilitiesEstimator`](@ref), then computing entropy from that distribution.
69-
In fact, the second method is just a 2-lines-of-code wrapper that calls
70-
[`probabilities`](@ref) and gives the result to the first method.
68+
[`ProbabilitiesEstimator`](@ref), then computing entropy from that distribution.
69+
In fact, the second method is just a 2-lines-of-code wrapper that calls
70+
[`probabilities`](@ref) and gives the result to the first method.
7171
3. From input data `x`, by using a dedicated [`EntropyEstimator`](@ref) that computes
72-
entropy in a way that doesn't involve explicitly computing probabilities first.
72+
entropy in a way that doesn't involve explicitly computing probabilities first.
7373
74-
The entropy (first argument) is optional. When `est` is a probability estimator,
74+
The entropy definition (first argument) is optional.
75+
When `est` is a probability estimator,
7576
`Shannon()` is used by default. When `est` is a dedicated entropy estimator,
7677
the default entropy type is inferred from the estimator (e.g. [`Kraskov`](@ref)
7778
estimates the [`Shannon`](@ref) entropy).
@@ -123,16 +124,17 @@ function entropy!(s::AbstractVector{Int}, e::Entropy, est::ProbabilitiesEstimato
123124
entropy(e, probs)
124125
end
125126

126-
entropy!(s::AbstractVector{Int}, est::ProbabilitiesEstimator, x) =
127+
function entropy!(s::AbstractVector{Int}, est::ProbabilitiesEstimator, x)
127128
entropy!(s, Shannon(), est, x)
129+
end
128130

129131
###########################################################################################
130132
# API: entropy from entropy estimators
131133
###########################################################################################
132134
# Dispatch for these functions is implemented in individual estimator files in
133135
# `entropies/estimators/`.
134136
function entropy(e::Entropy, est::EntropyEstimator, x)
135-
t = string(typeof(e).name.name)
137+
t = string(nameof(typeof(e)))
136138
throw(ArgumentError("$t entropy not implemented for $(typeof(est)) estimator"))
137139
end
138140

@@ -150,20 +152,16 @@ entropy(est::EntropyEstimator, x; base = 2) = entropy(Shannon(; base), est, x)
150152
# Normalize API
151153
###########################################################################################
152154
"""
153-
entropy_maximum(e::Entropy, est::ProbabilitiesEstimator, x) → m::Real
154-
155-
Return the maximum value `m` of the given entropy type based on the given estimator
156-
and the given input `x` (whose values are not important, but layout and type are).
155+
entropy_maximum(e::Entropy, est::ProbabilitiesEstimator) → m::Real
157156
158-
This function only works if the maximum value is deducable, which is possible only
159-
when the estimator has a known [`total_outcomes`](@ref).
157+
Return the maximum value `m` of the given entropy definition based on the given estimator.
160158
161159
entropy_maximum(e::Entropy, L::Int) → m::Real
162160
163161
Alternatively, compute the maximum entropy from the number of total outcomes `L` directly.
164162
"""
165-
function entropy_maximum(e::Entropy, est::ProbabilitiesEstimator, x)
166-
L = total_outcomes(x, est)
163+
function entropy_maximum(e::Entropy, est::ProbabilitiesEstimator)
164+
L = total_outcomes(est)
167165
return entropy_maximum(e, L)
168166
end
169167
function entropy_maximum(e::Entropy, ::Int)
@@ -182,7 +180,7 @@ Notice that unlike for [`entropy`](@ref), here there is no method
182180
the amount of _possible_ events (i.e., the [`total_outcomes`](@ref)) from `probs`.
183181
"""
184182
function entropy_normalized(e::Entropy, est::ProbabilitiesEstimator, x)
185-
return entropy(e, est, x) / entropy_maximum(e, est, x)
183+
return entropy(e, est, x) / entropy_maximum(e, est)
186184
end
187185
function entropy_normalized(est::ProbabilitiesEstimator, x::Array_or_Dataset)
188186
return entropy_normalized(Shannon(), est, x)

src/probabilities.jl

+26-50
Original file line numberDiff line numberDiff line change
@@ -32,17 +32,18 @@ function Probabilities(x::AbstractVector{<:Integer})
3232
return Probabilities(x ./ s, true)
3333
end
3434

35-
3635
# extend base Vector interface:
37-
for f in (:length, :size, :eachindex, :eltype,
36+
for f in (:length, :size, :eachindex, :eltype, :parent,
3837
:lastindex, :firstindex, :vec, :getindex, :iterate)
3938
@eval Base.$(f)(d::Probabilities, args...) = $(f)(d.p, args...)
4039
end
4140
Base.IteratorSize(::Probabilities) = Base.HasLength()
41+
# Special extension due to the rules of the API
4242
@inline Base.sum(::Probabilities{T}) where T = one(T)
4343

4444
"""
45-
ProbabilitiesEstimator
45+
ProbabilitiesEstimator
46+
4647
The supertype for all probabilities estimators.
4748
4849
In Entropies.jl, probability distributions are estimated from data by defining a set of
@@ -66,6 +67,11 @@ across experimental realizations, by using the outcome as a dictionary key and t
6667
probability as the value for that key (or, alternatively, the key remains the outcome
6768
and one has a vector of probabilities, one for each experimental realization).
6869
70+
We have made the design decision that all probabilities estimators have a well defined
71+
outcome space when instantiated. For some estimators this means that the input data
72+
`x` must be provided both when instantiating the estimator, but also when computing
73+
the probabilities.
74+
6975
All currently implemented probability estimators are:
7076
7177
- [`CountOccurrences`](@ref).
@@ -110,13 +116,14 @@ function probabilities(est::ProbabilitiesEstimator, x)
110116
end
111117

112118
"""
113-
probabilities_and_outcomes(x, est) → (probs, Ω::Vector)
119+
probabilities_and_outcomes(est, x)
114120
115-
Return `probs, Ω`, where `probs = probabilities(x, est)` and
116-
`Ω[i]` is the outcome with probability `probs[i]`.
117-
The element type of `Ω` depends on the estimator.
121+
Return `probs, outs`, where `probs = probabilities(x, est)` and
122+
`outs[i]` is the outcome with probability `probs[i]`.
123+
The element type of `outs` depends on the estimator.
124+
`outs` is a subset of the [`outcome_space`](@ref) of `est`.
118125
119-
See also [`outcomes`](@ref), [`total_outcomes`](@ref), and [`outcome_space`](@ref).
126+
See also [`outcomes`](@ref), [`total_outcomes`](@ref).
120127
"""
121128
function probabilities_and_outcomes(est::ProbabilitiesEstimator, x)
122129
error("`probabilities_and_outcomes` not implemented for estimator $(typeof(est)).")
@@ -136,73 +143,42 @@ function probabilities! end
136143
# Outcome space
137144
###########################################################################################
138145
"""
139-
outcome_space([x,] est::ProbabilitiesEstimator) → Ω
146+
outcome_space(est::ProbabilitiesEstimator) → Ω
140147
141-
Return a container (typically `Vector`) containing all _possible_ outcomes of `est`,
142-
i.e., the outcome space `Ω`.
143-
Only possible for estimators that implement [`total_outcomes`](@ref),
144-
and similarly, for some estimators `x` is not needed. The _values_ of `x` are never needed;
145-
but some times the type and dimensional layout of `x` is.
148+
Return a container containing all _possible_ outcomes of `est`.
146149
"""
147-
function outcome_space(x, est::ProbabilitiesEstimator)
148-
outcome_space(est)
149-
end
150150
function outcome_space(est::ProbabilitiesEstimator)
151-
error(
152-
"`outcome_space(est)` not known/implemented for estimator $(typeof(est))."*
153-
"Try providing some input data, e.g. `outcomes_space(x, est)`."*
154-
"In some cases, this gives the dimensional layout/type information needed "*
155-
"to define the outcome space."
156-
)
151+
error("`outcome_space` not implemented for estimator $(typeof(est)).")
157152
end
158153

159154
"""
160-
total_outcomes([x::Array_or_Dataset,] est::ProbabilitiesEstimator) → Int
161-
162-
Return the size/cardinality of the outcome space ``\\Omega`` defined by the probabilities
163-
estimator `est` imposed on the input data `x`.
155+
total_outcomes(est::ProbabilitiesEstimator)
164156
165-
For some estimators, the total number of outcomes is independent of `x`, in which case
166-
the input `x` is ignored and the method `total_outcomes(est)` is called. If the total
167-
number of states cannot be known a priori, an error is thrown. Primarily used in
168-
[`entropy_normalized`](@ref).
169-
170-
## Examples
171-
172-
```jldoctest setup = :(using Entropies)
173-
julia> est = SymbolicPermutation(m = 4);
174-
175-
julia> total_outcomes(rand(42), est) # same as `factorial(m)` for any `x`
176-
24
177-
```
157+
Return the length (cardinality) of the outcome space ``\\Omega`` of `est`.
178158
"""
179-
function total_outcomes(x, est::ProbabilitiesEstimator)
180-
return length(outcome_space(x, est))
181-
end
159+
total_outcomes(est::ProbabilitiesEstimator) = length(outcome_space(est))
182160

183161
"""
184-
missing_outcomes(x, est::ProbabilitiesEstimator) → n_missing::Int
162+
missing_outcomes(est::ProbabilitiesEstimator, x) → n_missing::Int
185163
186164
Estimate a probability distribution for `x` using the given estimator, then count the number
187165
of missing (i.e. zero-probability) outcomes.
188166
189-
Works for estimators that implement [`total_outcomes`](@ref).
190-
191167
See also: [`MissingDispersionPatterns`](@ref).
192168
"""
193-
function missing_outcomes(x::Array_or_Dataset, est::ProbabilitiesEstimator)
169+
function missing_outcomes(est::ProbabilitiesEstimator, x::Array_or_Dataset)
194170
probs = probabilities(x, est)
195-
L = total_outcomes(x, est)
171+
L = total_outcomes(est)
196172
O = count(!iszero, probs)
197173
return L - O
198174
end
199175

200176
"""
201-
outcomes(x, est::ProbabilitiesEstimator)
177+
outcomes(est::ProbabilitiesEstimator, x)
202178
Return all (unique) outcomes contained in `x` according to the given estimator.
203179
Equivalent with `probabilities_and_outcomes(x, est)[2]`, but for some estimators
204180
it may be explicitly extended for better performance.
205181
"""
206-
function outcomes(x, est::ProbabilitiesEstimator)
182+
function outcomes(est::ProbabilitiesEstimator, x)
207183
return probabilities_and_outcomes(est, x)[2]
208184
end

src/probabilities_estimators/counting/count_occurences.jl

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
export CountOccurrences
22

33
"""
4-
CountOccurrences()
4+
CountOccurrences(x)
55
66
A probabilities/entropy estimator based on straight-forward counting of distinct elements in
77
a univariate time series or multivariate dataset. This is the same as giving no
88
estimator to [`probabilities`](@ref).
99
1010
## Outcome space
1111
The outcome space is the unique sorted values of the input.
12+
Hence, input `x` is needed for a well-defined outcome space.
1213
"""
13-
struct CountOccurrences <: ProbabilitiesEstimator end
14+
struct CountOccurrences{X} <: ProbabilitiesEstimator
15+
x::X
16+
end
1417

1518
function probabilities_and_outcomes(::CountOccurrences, x::Array_or_Dataset)
1619
z = copy(x)
@@ -19,7 +22,7 @@ function probabilities_and_outcomes(::CountOccurrences, x::Array_or_Dataset)
1922
return probs, unique!(z)
2023
end
2124

22-
outcome_space(x, ::CountOccurrences) = sort!(unique(x))
25+
outcome_space(est::CountOccurrences) = sort!(unique(est.x))
2326

2427
probabilities(::CountOccurrences, x::Array_or_Dataset) = probabilities(x)
2528
function probabilities(x::Array_or_Dataset)

src/probabilities_estimators/dispersion/dispersion.jl

+2-2
Original file line numberDiff line numberDiff line change
@@ -118,11 +118,11 @@ function probabilities_and_outcomes(est::Dispersion, x::AbstractVector{<:Real})
118118
return Probabilities(hist), dispersion_patterns
119119
end
120120

121-
total_outcomes(est::Dispersion)::Int = est.encoding.c ^ est.m
122-
123121
function outcome_space(est::Dispersion)
124122
c, m = 1:est.encoding.c, est.m
125123
cart = CartesianIndices(ntuple(i -> c, m))
126124
V = SVector{m, Int}
127125
return map(i -> V(Tuple(i)), vec(cart))
128126
end
127+
# Performance extension
128+
total_outcomes(est::Dispersion)::Int = total_outcomes(est.encoding) ^ est.m

0 commit comments

Comments
 (0)