Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double Stochasticity #4

Closed
theogf opened this issue Mar 18, 2020 · 5 comments
Closed

Double Stochasticity #4

theogf opened this issue Mar 18, 2020 · 5 comments

Comments

@theogf
Copy link
Member

theogf commented Mar 18, 2020

ADVI and other methods (SVGD, etc) can treat double stochasticity (Stochastic estimation of the expectation via samples, Stochastic estimation of the log joint via mini batches) :

M. Titsias and M. Lázaro-Gredilla. Doubly stochastic variational Bayes for non-conjugate
inference.

@torfjelde
Copy link
Member

torfjelde commented Mar 23, 2020

This relates to a more general question of how to support different approaches to gradient estimation. At the moment the way to do that is by overloading grad!(vo, alg::VariationalInference, q, model::Model, θ, out, args...) by dispatching differently on:

  • vo, e.g. vo::ELBO where you might have different estimation techniques for different objectives
  • alg, e.g. alg::ADVI where you might have different estimation techniques for different algorithms, e.g. Blackbox-VI where you use control variates
  • q, e.g. TuringDiagMvNormal, where you might have different estimation techniques for different variational families, e.g. for normal distribution you can compute the entropy term of the ELBO in closed form

I think we could do something better than this though. Something like defining an expectation "operator" or something, e.g. MCEstimator which simply samples from a "proposal" distribution and evaluates the function at the sampled points, like we do for the ELBO. For example, I recently discovered https://github.com/QuantEcon/Expectations.jl and so I'm curious if we can either extract some ideas from that or even do what we want in Expectations.jl. Not sure. I'm going to write an issue on the topic so we can have a proper discussion about this.

@theogf
Copy link
Member Author

theogf commented Apr 8, 2020

For my experience with AGP.jl, I created two different approaches for VI. One with sampling (the actual ADVI) and one with quadrature. For the quadrature I am directly using FastGaussQuadrature as Expectations.jl is mostly a wrapper around it.
Quadrature is the way to go when the input domain of the likelihood is univariate (and ~ gaussian), but for other sampling (ADVI) is the way to go.

Replacing vo::ELBO by a stochastic one is definitely a good idea. I would see more StochasticELBO for the name but that's up for discussion.

@torfjelde
Copy link
Member

Expectations.jl is mostly a wrapper around it.

Replacing vo::ELBO by a stochastic one is definitely a good idea. I would see more StochasticELBO for the name but that's up for discussion.

I didn't mean using the existing Expectations.jl for expectation-estimation (other than in the univariate case, for exactly the reason you mention). I just meant that I think the interface is nice and that it would great to have something similar to for this package. A lot of different VI algorithms can be separated only in the way they do the estimation of the objective, e.g. MC, importance-weighted, semi-implicit, and so I would love to have some way for me to only implement a new estimator and then plug it into, say, ADVI and have it just work. This would be in contrast to me having to implemented a ADVIWithSpecificEstimtor or whatever. Does that make sense?

And this is why it's taking me so bloody long (sorry about that, again) to do a proper write-up of my thoughts on the topic, since I'll need to do a proper review of existing VI methods to get a good understanding of what functionality we need to cover.

ADVI and other methods (SVGD, etc) can treat double stochasticity (Stochastic estimation of the expectation via samples, Stochastic estimation of the log joint via mini batches) :

One thing I forgot to mention is the "stochastic" or mini-batch VI in your initial comment.
The way I'm planning no doing "stochastic" (read: mini-batch) VI is to implement MinibatchVI <: VariationalInference which just wraps any inference algorithm, e.g. ADVI, to produce a "stochastic" version. IMO this is a nice way of doing it 👍 I've done this in TuringLang/Turing.jl#903 (though this is now outdated, and is a bit ugly to being implemented in an old version of Turing).

@Red-Portal
Copy link
Member

Related discussion in LogDensityProblems.jl

@Red-Portal
Copy link
Member

Closed in favor of #38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants