Skip to content

Complexity measure API, v2 #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kahaaga opened this issue Oct 18, 2022 · 11 comments
Closed

Complexity measure API, v2 #133

kahaaga opened this issue Oct 18, 2022 · 11 comments
Labels
discussion-design Discussion or design matters

Comments

@kahaaga
Copy link
Member

kahaaga commented Oct 18, 2022

The need for a common complexity interface

This issue supersedes #81.

I think we should have a common interface for computing various entropy-like complexity measures, such as reverse_dispersion (already implemented), sample_entropy (#71 ), approximate_entropy (#72), missing_dispersion_patterns (#124), statistical_complexity (#131), etc (currently open PRs).

It is fine to have the methods sample_entropy, reverse_dispersion, statistical_complexity, but I really, really like the generic entropy interface where we just dispatch on different Entropy methods and ProbabilitiesEstimators, and would like something similar for other complexity measures.

Deciding on an interface like this is important because:

Suggested interface

I propose the following:

abstract type ComplexityMeasure end

function complexity(c::ComplexityMeasure, x, args...; kwargs...)
# potentially also
function complexity_normalized(c::ComplexityMeasure, x, args...; kwargs...)

Every complexity measure can then just have its own complexity type that we can dispatch on, e.g.

struct StatisticalComplexity{A1, A2} <: ComplexityMeasure
    a1::A1
    a2::A2
    ...
end

function complexity(c::StatisticalComplexity, x)
    ....
end

What do you think, @ikottlarz and @Datseris ?

@kahaaga kahaaga added the discussion-design Discussion or design matters label Oct 18, 2022
@Datseris
Copy link
Member

How does having a complexity function simplify the multiscale PR?

@kahaaga
Copy link
Member Author

kahaaga commented Oct 18, 2022

How does having a complexity function simplify the multiscale PR?

With a complexity function, the exported methods for multiscale would be:

multiscale(e::Entropy, alg::MultiScaleAlgorithm, x::AbstractVector, est::ProbabilitiesEstimator; kwargs...)
multiscale(c::ComplexityMeasure, alg::MultiScaleAlgorithm, x; kwargs...)

Without it, the exported functions would be

multiscale(e::Entropy, alg::MultiScaleAlgorithm, x::AbstractVector, est::ProbabilitiesEstimator; kwargs...)

multiscale_sampleentropy(alg::MultiScaleAlgorithm, x; kwargs...)
multiscale_approx_entropy(alg::MultiScaleAlgorithm, x; kwargs...)
multiscale_reverse_dispersion(alg::MultiScaleAlgorithm, x; kwargs...)
multiscale_statistical_complexity(alg::MultiScaleAlgorithm, x; kwargs...)
multiscale_missing_disp_patterns(alg::MultiScaleAlgorithm, x; kwargs...)
...

@kahaaga
Copy link
Member Author

kahaaga commented Oct 18, 2022

Moreover, there are multiple variations on, for example, multiscale sample entropy. With this approach, we can capture all variations of it using parameters of struct SampleEntropy <: ComplexityMeasure end, not needing to define custom methods with custom names and different keywords for each variation of the method.

@Datseris
Copy link
Member

Okay I agree with the proposed API. I don;t agree with having both the complexity API, and individual functions for each complexity measure as we do with entropies_.... I'm kind of done with having all these "conveniecne" methods which don't really offer any convenience and just allow an alternative syntax.

@kahaaga
Copy link
Member Author

kahaaga commented Oct 18, 2022

Okay I agree with the proposed API.

Ok, good!

I don't agree with having both the complexity API, and individual functions for each complexity measure as we do with entropies_.... I'm kind of done with having all these "conveniecne" methods which don't really offer any convenience and just allow an alternative syntax.

I get the point. However, the issues previously raised about search engine visibility etc still apply. We've been insisting on keeping convenience methods for e.g. entropy_permutation. But looking at the number of citations for some of these complexity measures, they are actually used more than the permutation entropy:

  • Approximate entropy (Pincus, 1991). ~6200 citations.
  • Sample entropy (Richman et al., 2000). ~ 7100 citations.
  • Permutation entropy (Bandt & Pompe, 2002). ~3800 citations.

If argue that if we want to use the visibility argument for the case of specific entropies (probabilities), we should also apply it for the case of complexity measures. If not, we should just drop the convenience methods completely.

@Datseris
Copy link
Member

However, the issues previously raised about search engine visibility etc still apply.

I don't get your point. Search engines work with keywords. In the docstring of whatever method you will have the keywords "Approximate entropy". Search engines won't find function names, because they don't have spaces and hence don't qualify the keywords the engines search for. We definitely have the keywords "permutation entropy" in the library.

If not, we should just drop the convenience methods completely.

From my perspective the reason to keep a few of these convenience methods is (1) for education purposes so that users learn that the permutation entropy is just a subclass of something more general and (2) for the future paper, so that we can highlight that, if we were using the approach of alternative software, we would offer 1000s upon 1000s of functions, but instead we manage the same amount of content with much less function definitions.

@kahaaga
Copy link
Member Author

kahaaga commented Oct 18, 2022

From my perspective the reason to keep a few of these convenience methods is (1) for education purposes so that users learn that the permutation entropy is just a subclass of something more general and (2) for the future paper, so that we can highlight that, if we were using the approach of alternative software, we would offer 1000s upon 1000s of functions, but instead we manage the same amount of content with much less function definitions.

Excellent point.

Shall we keep the current entropy convenience methods, and perhaps add one or two convenience methods (for the sake of educational purposes) complexity measures too? sample_entropy and approximate_entropy are good candidates, since they are so widely used.

With only two functions (entropy and entropy_normalized) , we provideN = (2 * nP * nE * nM) + 2*nD ways of estimating entropy, where nP is the number of ProbabilitiesEstimators, nE is the number of Entropys, nM is the number of multiscale sampling schemes, and nD is the number of direct (Shannon) entropy estimation methods. We currently have 12 probabilities estimators, 5 entropy types, 2 multiscale sampling schemes, and 2 direct entropy estimators, yielding N = (2 * 5 * 12 * 2) + 2*2 = 244 currently possible ways of estimating entropy, not including parameter variations, if all methods have normalised versions.

A simple (nonexhaustive) table in the documentation could replace many of the convenience methods, and can be a good way of ensuring all common names appear somewhere in the documentation. This may also serve as a starting point for a table listing common literature methods for our paper, e.g.

Method Syntax Reference
Permutation entropy  entropy(x, SymbolicPermutation()) Bandt & Pompe (2002)
Dispersion entropy entropy(x, Dispersion())  Rostaghi et al. (2016)
Composite multiscale dispersion entropy multiscale(Dispersion(), Composite(), x)  Azami et al. (2017)

We'll reach similar numbers of possible combinations for the complexity measures in the future, too, I guess. A similar table could be useful.

Method Syntax  Reference
Reverse dispersion entropy complexity(ReverseDispersion(), x)  Gao & Want (2002)
StatisticalComplexity complexity(StatisticalComplexity(), x) Rosso et al. (2016)

@Datseris
Copy link
Member

I like this a lot! The table is great! Only change I would do is replace SymbolicPermutation() with SymbolicPermutation(...) so that it hints that there are options.

Okay let's keep the sample_entropy and approxiamte_entropy as the only "convenience complexity functions".

@kahaaga
Copy link
Member Author

kahaaga commented Oct 18, 2022

I like this a lot! The table is great! Only change I would do is replace SymbolicPermutation() with SymbolicPermutation(...) so that it hints that there are options.
Okay let's keep the sample_entropy and approximate_entropy as the only "convenience complexity functions".

Great!

@ikottlarz this will affect your PR #131. Do the designs of complexity and complexity_normalized functions above sound reasonable to you?

I'll submit a PR within the hours where I re-do the reverse dispersion method with the new API, so you can see how it'll look in the end.

@ikottlarz
Copy link
Member

@kahaaga this sounds good to me. I'm currently working on your reviews, and will reorganize my code accordingly!

@kahaaga
Copy link
Member Author

kahaaga commented Oct 19, 2022

Closed by #134

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-design Discussion or design matters
Projects
None yet
Development

No branches or pull requests

3 participants