`IndirectEntropy` should not subtype `Entropy`. #167

kahaaga · 2022-11-08T14:15:49Z

While getting CausalityTools ready for Entropies v2, I encountered the following issue:

There are now two ways of computing entropies: either

entropy(e::Entropy, x, est::ProbabilitiesEstimator)
entropy(e::IndirectEntropy, x).

I have two problems with this:

1. The indirect entropies are not entropies.

They are types indicating that a certain type of entropy is being estimated using a certain numerical procedure. A much more natural choice would be to have just one signature entropy(e::Entropy, x, est::Union{ProbabilitiesEstimator, EntropyEstimator}, where EntropyEstimator is what we now call IndirectEntropy.

The API should be predictable, and the types should be named in a way that reflects what is actually going on. I argue that IndirectEntropy is a bit misleading, because it can be confused with Entropy, which is something else entirely. Actually, now we equate (through the type hierarchy) an entropic quantity with its estimator. But the former is used to estimate the latter. They should be different things.

We already make this distinction for probabilities. Probabilities is the quantity being computed, while ProbabilitiesEstimator is the procedure used to estimate it. Entropys should have the corresponding quantity EntropyEstimator that computes it.

Thus:

entropy(::Kraskov, x) should be entropy(::Shannon, x, ::Kraskov).
For example, if the Kraskov estimator can be used to estimate some other entropy, then dispatch must be implemented for that particular entropy, i.e. entropy(::SomeOtherEntropy, x, ::Kraskov).

2. Two distinct signatures lead to code duplication

The way we've designed it now, I have to duplicate source code using entropy where I want to be able to input both probability estimators and entropy estimators. I have to make:

One method for ProbabilitiesEstimator (which has one order of arguments) and,
Aother method for IndirectEntropy. I can, of course, write wrapper functions and circumvent this issue. But that shouldn't be necessary.

The text was updated successfully, but these errors were encountered:

Datseris · 2022-11-08T14:44:39Z

For example, if the Kraskov estimator can be used to estimate some other entropy, then dispatch must be implemented for that particular entropy, i.e. entropy(::SomeOtherEntropy, x, ::Kraskov).

Well, can it though? Or are we generalizing something that we shouldn't care to generalize?

entropy(::Shannon, x, ::Kraskov) missleads that one could use this with other entropies, which is totally not the case as far as I can tell.

Personally I am happy with current API, but it also seems to me that simply swapping the order of arguments into entropy(x, ::InderectEntropy) solves all your problems. Anyways I am also okay with the plan of renaming InderectEntropy to EntropyEstimator, but then you would have to resolve the problem of not missleading people that this could work for any entropy. And the name ShannonEstimator seems weird to me.

I can, of course, write wrapper functions and circumvent this issue. But that shouldn't be necessary.

You'd have only two wrapper functions, entropy(x, e::IndirectEntropy) = entropy(e, x) and entropy(::NotShannon, x, y::IndirectEntropy) = error. It would take you 2 lines of code, and all of your problems are solved for all of CausalityTools.jl. How many lines of code an time would it take to implement the proposed change instead? Maybe we can we can just add these two lines of code here instead actually.

kahaaga · 2022-11-08T15:46:58Z

Anyways I am also okay with the plan of renaming InderectEntropy to EntropyEstimator

Ok, just for the sake of the argument here, let's

Rename IndirectEntropy to EntropyEstimator.
Rearrange the order of the arguments to follow Julia's code style recommendations where types are put before non-mutated inputs. That is: entropy(e::Renyi, est::EntropyEstimator, x, instead of entropy(e::Renyi, x, est::EntropyEstimator) (why did we not follow the style guide to begin with?).

I really want to follow the same style in CausalityTools too (i.e. entropy/mutualinfo/transfer entropy type first, then estimator then input data).

... but then you would have to resolve the problem of not missleading people that this could work for any entropy. And the name ShannonEstimator seems weird to me.

We'd just keep the names as they are. Each docstring should clearly state what type of entropy the estimator computes (all of them do already). With the change I propose, there is just a single supertype EntropyEstimator, and dispatch controls what is allowed. Example:

"""
    Kraskov <: EntropyEstimator
    
A Shannon entropy estimator that computes entropy directly without first computing probabilities, by doing ....
"""

"""
    Zhu <: EntropyEstimator
    
A Shannon entropy estimator that computes entropy directly without first computing probabilities, by doing ....
"""

"""
    ClassicEstimator <: EntropyEstimator
    
A Tsallos entropy estimator that computes entropy directly without first computing probabilities, by doing ....
"""

function entropy(e::Renyi, est::Kraskov, args...)
    e.q == 1 || error("$est can't be used to estimate Renyi entropy with q = $(e.q)")
    ... # implementation
end

function entropy(e::Renyi, est::Zhu,  args...)
    e.q == 1 || error("$est can't be used to estimate Renyi entropy with q = $(e.q)")
    ... # implementation
end

function entropy(e::Tsallis, est::ClassicalEstimator, args...)
    ... # implementation
end

# Generic error message in all other cases.
entropy(e::Entropy, est::EntropyEstimator, args...) = error("$e entropy can't be estimated using a $est estimator"

Here, args is just a set of time series or datasets. With the argument order, we can also just feed args... on to the Dataset constructor, if we want the joint entropy of multiple datasets/timeseries.

To me, this seems a rather obvious way to streamline the api, without any obvious downsides. Any confusion would be immediately resolved when the user reads the docs, and we will supplement with tables of available methods and their syntax.

but it also seems to me that simply swapping the order of arguments into entropy(x, ::InderectEntropy) solves all your problems

Simply swapping the order of the arguments doesn't solve my issue, because entropy(e::Shannon, x, est::ProbabilitiesEstimator) has three arguments, while entropy(e::IndirectEntropy, x) has two. I therefore can't write generic code which accepts e::Entropy as an argument.

Overall, this is a low-effort change that will 1) make at least my life easier and lead to better and non-duplicated code in the CausalityTools stuff, 2) be more intiutive for the user, because the only thing that changes between using direct probabilities and indirect estimator is the second argument (not having to use two different signatures and think about why there are two signatures for doing the same thing)

Datseris · 2022-11-08T19:08:02Z

Simply swapping the order of the arguments doesn't solve my issue, because entropy(e::Shannon, x, est::ProbabilitiesEstimator) has three arguments, while entropy(e::IndirectEntropy, x) has two. I

But entropy(x, est::Probabilities) is another entropy method that defaults to Shannon, which seems to be exactly what you want anyways, as all inderect entropies only work with shannon to begin with.

Datseris · 2022-11-08T19:08:34Z

in any case I am ok with the proposed change

kahaaga · 2022-11-08T21:11:50Z

But entropy(x, est::Probabilities) is another entropy method that defaults to Shannon, which seems to be exactly what you want anyways, as all inderect entropies only work with shannon to begin with.

In CausalityTools, I will always feed a first e::Entropy as the first argument, which may or may not be Shannon, so the three argument constructor is needed.

in any case I am ok with the proposed change

Excellent.

kahaaga · 2022-12-19T00:01:42Z

Closed by #168.

kahaaga added the discussion-design Discussion or design matters label Nov 8, 2022

kahaaga added this to the 2.0 milestone Nov 8, 2022

kahaaga mentioned this issue Nov 9, 2022

Entropy signature and folder structure cleanup #168

Merged

2 tasks

kahaaga closed this as completed Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`IndirectEntropy` should not subtype `Entropy`. #167

`IndirectEntropy` should not subtype `Entropy`. #167

kahaaga commented Nov 8, 2022

Datseris commented Nov 8, 2022 •

edited

Loading

kahaaga commented Nov 8, 2022 •

edited

Loading

Datseris commented Nov 8, 2022

Datseris commented Nov 8, 2022

kahaaga commented Nov 8, 2022

kahaaga commented Dec 19, 2022

IndirectEntropy should not subtype Entropy. #167

IndirectEntropy should not subtype Entropy. #167

Comments

kahaaga commented Nov 8, 2022

1. The indirect entropies are not entropies.

2. Two distinct signatures lead to code duplication

Datseris commented Nov 8, 2022 • edited Loading

kahaaga commented Nov 8, 2022 • edited Loading

Datseris commented Nov 8, 2022

Datseris commented Nov 8, 2022

kahaaga commented Nov 8, 2022

kahaaga commented Dec 19, 2022

`IndirectEntropy` should not subtype `Entropy`. #167

`IndirectEntropy` should not subtype `Entropy`. #167

Datseris commented Nov 8, 2022 •

edited

Loading

kahaaga commented Nov 8, 2022 •

edited

Loading