-
Notifications
You must be signed in to change notification settings - Fork 14
Permutation entropy in higher dimensions (2D) #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add random to deps Separate probabilities and entropy estimators
Codecov Report
@@ Coverage Diff @@
## main #74 +/- ##
==========================================
- Coverage 79.32% 77.48% -1.85%
==========================================
Files 21 27 +6
Lines 624 724 +100
==========================================
+ Hits 495 561 +66
- Misses 129 163 +34
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Thanks for your effort @kahaaga . If it is okay with you, I'd like to implement a different solution to #73, that is both more performant, and in my opinion a simpler interface with less source code. I've written code that iterates over "neighborhoods" of multi dimensional arrays several times, and refined it over three different versions, starting from TimeseriesPrediction.jl, into its fastest possible form that exists currently in Agents.jl. Here is my code outline, and I show below an example of the API as seen by a user.
Here's how it looks like. The following defines a 2x2 "square" stencil that goes from the pixel of the array to the bottom and right. stencil = PermutationStencil([CartesianIndex(0,1), CartesianIndex(0,1), CartesianIndex(1,1)]; periodic = true)
# stencils always include the 0,0 cartesian index.
data = [rand(50,50) for _ in 1:50]
perment = genentropy(data, stencil) So, given my familiarity with the codebase of Agents.jl, I'll open a second PR where I implement this interface. The amount of new API stuff is definitely much much smaller. Whether it is much much faster as well, we will have to see once I open it. I have not read any other PRs yet, like #70, so I don't know yet about this generator interface you want. It will take some time to go to the other stuff. |
Hey @Datseris! Faster, less-code approaches are always welcome. I feel no particular urge to stick with my implementation if you've got something more efficient. I recently started using Agents.jl myself and was amazed by both the performance and simplicity of the code/api, so more of that, please 😁 I suggest we leave this PR open until you've submitted your version, and then compare performance.
For the permutation entropy-related stuff here, it may not be necessary with a generator approach if your implementation is just as efficient. However, it is definitely a viable approach to increase performance for other methods.
My main take-away from working on the currently open PRs there is a lot to be gained from the pre-allocation approach using a generator. A slow-performing (allocation-a-lot) approach can be the difference between waiting hours to waiting weeks for numerical experiments to finish. The pre-allocation/generator approach becomes highly relevant in the context null hypothesis testing for the methods in CausalityTools and friends. A user might trigger hundreds of thousands or millions of entropy calculations under the hood for a single experiment. For such uses cases, pre-allocation things like histogram containers will benefit performance greatly. Again, perhaps this is not relevant for permutation-based methods, but for other methods, the performance gain can be pretty substantial. However, I do like the spirit of keeping everything Let's revisit this when you've had time to look at the other PRs too, and your solution to #73 is ready.
No worries. I've got plenty of other stuff to do in the meanwhile that doesn't hinge on any of my currently open PRs :D |
Shouldn't we close this in favor of #78 ...? |
What is this PR?
A permutation entropy estimator for 2D arrays with arbitrary element type (matrices of numbers, strings , or any other custom type which can be sorted will work).
Interface
I use a flexible stencil-based approach, as discussed in #73. The new estimator has two signatures:
where the first once takes any (
D
-dimensional) (hyper)rectangular stencil as input, and the second one is a convenience constructor for square stencils of sizem*m
inD
dimensions. I've only implemented the estimator forD == 2
(for use on images, matrices).The 3D case requires a bit more though, and I will submit that in a separate PR, once we've decided on the final api design (see below).
Basic usage
The interface is the same as for most other estimators. Below is a reproduction of the example in Riberio et al. (2012)
Generators for repeated application
Again, I find that the generator approach discussed in #70 is necessary for efficient repeated computations.
I here use both
probabilitygenerator
/ProbabilityGenerator
andentropygenerator
/EntropyGenerator
. I'm favoring the approach in this PR, where probability generation and entropy computation is done separately.Tests
I have implemented generic tests, and successfully reproduce the example from Riberio et al. (2012)
What needs to be discussed before merging?
We need to decide on
ProbabilityGenerator
s andEntropyGenerator
s, or just haveEntropyGenerator
s.I definitely favor having both
ProbabilityGenerator
s andEntropyGenerator
s, because only probabilities (and not entropies) are needed for #75. By separating, we can compute both entropy and other measures that require probabilities without double the computations. The generators are also trivially extendable to other methods, if wanted/needed.References
Ribeiro, H. V., Zunino, L., Lenzi, E. K., Santoro, P. A., & Mendes, R. S. (2012). Complexity-entropy causality plane as a complexity measure for two-dimensional patterns.