Skip to content

Encoding for binnings #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Dec 16, 2022
Merged

Encoding for binnings #177

merged 19 commits into from
Dec 16, 2022

Conversation

Datseris
Copy link
Member

@Datseris Datseris commented Nov 17, 2022

This PR achieves:

  • Declare the encoding API. Closes Encoding API #171
  • Implements encoding API for binnings
  • Refactors, and shortens, source code for histogram stuff. The binnings are nothing more than declarative structs that provide instructions on how to make a binning. The encoding is instantiated immediatelly.
  • ValueHistogram now requires that input x is provided if the binning is not fixed, so that outcome space is always well defined.
  • Moves Entropies.jl to StateSpaceSets.jl. This is the core, lowest dependency of JuliaDynamics that basically defines Dataset, dimension.

I have not written explicit tests for encodings yet. But the current tests pass. To consider also: should we expose the n_eps keyword to the API, and if so, maybe its time to give it a better name.

@Datseris Datseris added binning Related with binning: histograms, transfer operator enhancement New feature or request that is non-breaking labels Nov 17, 2022
@Datseris
Copy link
Member Author

cc @kahaaga in this PR I will change ValueHistogram to have the encoding directly as its field. This means that now the rectangular binnings are only used as "instructions". An instance of a bbinning is never used/stored but only to generate the RectangularBinEncoding. This will simplify source code even more. In fact, the ValueHistogram will become a three liner call to encode and decode.

@kahaaga
Copy link
Member

kahaaga commented Nov 17, 2022

cc @kahaaga in this PR I will change ValueHistogram to have the encoding directly as its field. This means that now the rectangular binnings are only used as "instructions". An instance of a bbinning is never used/stored but only to generate the RectangularBinEncoding. This will simplify source code even more. In fact, the ValueHistogram will become a three liner call to encode and decode.

Excellent.

@Datseris
Copy link
Member Author

@kahaaga this PR will also remove all near-duplicate code that handles explictly input x that is a timeseries Vector{<:Real}. It will treat it as a 1-dimensional dataset. all related outcome space will be 1-element static vectors. You can re0interpret them to vector of numbers like so:

y = rand(SVector{1, Float64}, 100)
x = reinterpret(Float64, y) # behaves identically to Vector{Float64}

@Datseris
Copy link
Member Author

Datseris commented Nov 17, 2022

This PR will also make outcome_space work for any binning. it will make the histogram size part of the bin encoding struct.

At the encoding level we have full knowledge: we know the limits of the histogram as we have already extracted them from data. I'll also put a notice to the docstirngs that it is strongly preferred to use Fixed binnings as their outcome space is not dependent on the input data.

@kahaaga
Copy link
Member

kahaaga commented Nov 17, 2022

@kahaaga this PR will also remove all near-duplicate code that handles explictly input x that is a timeseries Vector{<:Real}. It will treat it as a 1-dimensional dataset. all related outcome space will be 1-element static vectors. You can re0interpret them to vector of numbers like so:

That makes sense. Then, before merging this, we need to make sure that

  • Input dimension is checked explicitly for estimators that only work on 1D data. This is the case for some of the direct entropy estimators.
  • Update signatures in docstrings for all probabilities and entropy estimators that previously had Vector_or_Dataset

@Datseris
Copy link
Member Author

no no I think you misunderstood: You can still use timeserries as input. We just don't need duplicate source code. The outcome space for timeseries input will be a vector of 1-length svectors instead of a vector of Floats. From the user input perspective nothing changes. So functions that accepted Vector_or_Dataset, still do so.

@kahaaga
Copy link
Member

kahaaga commented Nov 17, 2022

We just don't need duplicate source code. The outcome space for timeseries input will be a vector of 1-length svectors instead of a vector of Floats. From the user input perspective nothing changes. So functions that accepted Vector_or_Dataset, still do so.

Ah, yes, I misunderstood. Then these sound like reasonable changes.

@kahaaga
Copy link
Member

kahaaga commented Nov 17, 2022

This PR will also make outcome_space work for any binning. it will make the histogram size part of the bin encoding struct.
At the encoding level we have full knowledge: we know the limits of the histogram as we have already extracted them from data.

Just to ensure that I understood this comment:

  • The outcome space for any RectangularBinning cannot be known in advance without providing input data. I.e. outcome_space(::RectangularBinning) is not defined.
  • The outcome space for any RectangularBinning is known when providing input data, because it is precisely the set of bins visited by the input data. I.e. outcome_space(x, ::RectangularBinning) is well-defined.

Is that what you meant?

I'll also put a notice to the docstirngs that it is strongly preferred to use Fixed binnings as their outcome space is not dependent on the input data.

I agree on recommending this.

@Datseris
Copy link
Member Author

Now ValueHistogram() signature requires x if the bininng isn't FixedBinning so that all outcome spaces are knwon. Its similar to how we do it for e.g. the spectral entropy.

@Datseris Datseris changed the title [wip] encoding for binnings Encoding for binnings Nov 17, 2022
@Datseris Datseris marked this pull request as ready for review November 17, 2022 16:00
@Datseris Datseris requested a review from kahaaga November 17, 2022 16:02
@Datseris
Copy link
Member Author

@kahaaga I've updated the PR top level comment. This is ready for review and merge. I'll now be gonne for a fair bit of time, so go ahead and merge/modify as you see fit.

@kahaaga
Copy link
Member

kahaaga commented Nov 17, 2022

I've updated the PR top level comment. This is ready for review and merge. I'll now be gonne for a fair bit of time, so go ahead and merge/modify as you see fit.

Will do! Thanks for the effort.

@Datseris
Copy link
Member Author

@kahaaga I'll slowly try to catch up in the coming days. What's the status here? This wasn't merged in, but is there any blockers?

@kahaaga
Copy link
Member

kahaaga commented Nov 30, 2022

@Datseris Welcome back!

I've been majorly busy too, so I just haven't had time to look at this PR properly yet. Will try to do so tonight or tomorrow morning.

The status here is:

  • I need to review this PR properly to make sure it doesn't interfere with anything transfer operator related. Will try to do that tonight or tomorrow morning.
  • The entropy_signature branch (Entropy signature and folder structure cleanup #168) (just switching argument orders of probabilities and entropy + friends) is ready-ish. I think we both decided on the design there, so probably it can just be merged? I'm basing everything CausalityTools v2 on that branch, so would be nice to merge it sooner rather than later.
  • To complete the move to Complexity.jl, we need to finally decide on the design and and merge
  • If you agree on the design part of the two PRs above, I will do the work of bringing them up to speed with main. There's been quite a few changes to main since I first made those PRs.

In general:

  • I've implemented many more literature entropy/mutual information/conditional mutual information/relative entropy/divergence estimators, including the most recent nearest-neighbor based developments.
  • I'm preparing educational videos/animations for several of the entropy estimators (think workshop/documentation)
  • I've completely redesigned the information measures for CausalityTools, so that any lower-level estimator like EntropyEstimator and ProbabilitiesEstimator can be used to estimate higher-level things like (conditional) mutual information.

However, I haven't prepared any of the two points above for review yet, because I want us to decide on the argument order, multiscale and spatiotemporal API before finalizing anything.

TLDR: 1) I will try to look at this PR tonight. 2) The outstanding PRs need to be resolved to ready the other packages.

@kahaaga
Copy link
Member

kahaaga commented Nov 30, 2022

should we expose the n_eps keyword to the API, and if so, maybe its time to give it a better name.

Maybe just call it tolerance or tol, and explain in the docstring what it means?

@kahaaga
Copy link
Member

kahaaga commented Nov 30, 2022

Okay, I've tried to have a look, but I'll need some more time to wrap my head around these changes. The tests do not pass with the current version of main. This happens because encode_as_bin and decode_as_bin can't straight-forwardly be replaced with the new encode (which gives an integer) in the source code for TransferOperator. I think.

My mind is a bit clouded at the moment. I'll have another go at it tomorrow again with a blank slate.

@Datseris
Copy link
Member Author

Datseris commented Dec 1, 2022

@kahaaga would you be up for a video call as I still can't read many letters in one sitting and the reply is large here

@kahaaga
Copy link
Member

kahaaga commented Dec 1, 2022

@Datseris 13:30 CET, today?

@Datseris
Copy link
Member Author

Datseris commented Dec 1, 2022

just saw this now; I'm free from 14:30 CET

@kahaaga
Copy link
Member

kahaaga commented Dec 1, 2022

just saw this now; I'm free from 14:30 CET

@Datseris I'm busy with other things from 14:30 CET and onwards today. I can either do a call after 20 CET tonight, or any time from 10-17 CET tomorrow. Do any of those alternatives work for you?

@Datseris
Copy link
Member Author

Datseris commented Dec 1, 2022

Tomorrow is fine for me, I'll post a message here when I'm free, probably around 2-3pm!

@kahaaga
Copy link
Member

kahaaga commented Dec 1, 2022

Tomorrow is fine for me, I'll post a message here when I'm free, probably around 2-3pm!

Sounds good. Just tag me explicitly here when you're ready, so I'll get an e-mail notification.

@kahaaga
Copy link
Member

kahaaga commented Dec 2, 2022

@Datseris I'm available now. Just post the link when you're ready.

@Datseris
Copy link
Member Author

Datseris commented Dec 2, 2022

@kahaaga me too. I don't have any pro zoom stuff. I can post a google meets link.

@kahaaga
Copy link
Member

kahaaga commented Dec 2, 2022

@Datseris I have the pro version. Here's a Zoom meeting:

https://uib.zoom.us/j/66488837829?pwd=c3FCUFhZdjFjN0V5SkZ4UWJSd0o5dz09

@Datseris
Copy link
Member Author

@kahaaga shall we merge this in? Shall we merge in first #168 and then this? Transfer operator update can be done in a separate PR. I have about 3 hours of worktime per day and I want to finish as many tickboxes as possible and (maybe) get a stable Entropies release out before the end of the year. (but very low likelyhood that this will happen)

@Datseris
Copy link
Member Author

A lol #168 is already merged. Okay. I will update this branch to master by changing the signatures for value histogram stuff and merge. Transfer Operator fixes shoujld be done in a different PR.

@kahaaga
Copy link
Member

kahaaga commented Dec 16, 2022

@kahaaga shall we merge this in? Shall we merge in first #168 and then this? Transfer operator update can be done in a separate PR. I have about 3 hours of worktime per day and I want to finish as many tickboxes as possible and (maybe) get a stable Entropies release out before the end of the year. (but very low likelyhood that this will happen)

Yes, we should try to get a stable release out before the end of the year. I aim to do the same for CausalityTools (which I'm currently working on during my available work hours), so that we'll have it ready to prepare workshop material.

A lol #168 is already merged. Okay. I will update this branch to master by changing the signatures for value histogram stuff and merge. Transfer Operator fixes shoujld be done in a different PR.

Yes, do that! I have started fixing the transfer operator in a separate branch, but it isn't pushed to remote yet.

@Datseris
Copy link
Member Author

Test are passing here (for ValueHistogram). I am merging this in before my brain explodes due to the amount of PRs open with concurrent changes!

@Datseris Datseris merged commit 94d6e27 into main Dec 16, 2022
@Datseris Datseris deleted the binning_encoding branch December 16, 2022 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binning Related with binning: histograms, transfer operator enhancement New feature or request that is non-breaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encoding API
2 participants