Replies: 1 comment 3 replies
-
This all sounds good. We definitely need to clarify which objects we're dealing with and how we need to manipulate and model them, and sampler and kernel types sound like a suitable set of abstractions. As always, we need to consider how we could represent such things within the graph and rewrite frameworks, when possible/relevant. When we can, we're generally able to do more. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Current interface for
construct_sampler
To understand the kind of changes that having several possible samplers (including parametrized samplers) will require, let’s take a non-trivial example of building sampling functions for the Horseshoe prior, taken from AeMCMC’s test suite:
We observe
Y_rv
, and we want to sample from the posterior distribution oftau_rv
,lmbda_rv
,beta_rv
,h_rv
.AeMCMC
currently provides aconstruct_sampler
function:The
sample_steps
dictionary maps the random variables to the sampling step that was assigned to them. We can print the graph of the sampling step assigned tolambda_rv
:Samplers update rng state and the caller will need to pass these updates to the compiler later, so we return them as well. It consists of a dictionary that contains the updates of the state of the random number generator that we passed via
srng
:And finally we pass the initial value variables of the random variables we wish to sample from:
We can now easily build the graph for the sampler:
And we can run the
sampler
function in a python loop.Issues with streams of samplers and parametrized samplers
Although the current interface works perfectly for the Gibbs samplers, the downstream caller has no high-level information about what transformations were applied to the graph, and what samplers were assigned to the variables. They would have to reverse-engineer the information based on the graph that they receive. This becomes problematic the day we return a stream of samplers: how are humans (or machines) to reason about what AeMCMC returns?
Other issues related to information arise with NUTS and parametrized kernels in general:
It is useful to look at this from two perspectives: first from a caller that does not care about the details of the sampler and just wants it to "work", and then from the perspective of a statistician who would like to inspect AeMCMC's returned sampler.
If you just want to sample
We can simply create sampler types. Imagine we pass a complex model to
AeMCMC
but have no idea what the output sampling steps may be. All we can see is:If at least one of the RVs is assigned a parametrized sampler we will run into an issue with the previous workflow:
Indeed, compilation will fail with an unhelpful error message since the variables representing the parameters are missing. Thus we need to make it explicit that sampling steps might be parametrized. The simplest way to do that is by changing the API slightly and making
construct_sampler
always returns aparameters
variable:But that is not enough: one needs to know how to provide a value for these parameters at the very least. To set the value manually we need to know the type of the parameter and its shape. This information can be passed by setting the type and shape of the
TensorVariable
s when we initialize them.This is simple for models where random variables are built with concrete shape values, but immediately becomes problematic when shapes are symbolic:
We thus need to provide shape information in a /user-friendly/ way. We can even provide a function that returns the shape based on parameters, or provides an array of ones with this shape given the model parameters:
We thus need a parameter type to convey this information. For instance for the inverse mass matrix parameter of the NUTS sampler:
What if we don't want to provide values for the parameters and just want it to work? We need to bring in parameters adaptation.
Parameter adaptation
We could provide a
build_adaptation_step
function that is dispatched on the parameter type, but not only would this requires information about the previous sampler step, in many adaptation schemes it is not possible to decouple the updates of the parameters. The solution thus seems to provide a new high-level function:where
sampler
is akin forsample_steps
above but with extra information about the kernels that produced the sampling steps. With the current notations you would build an adaptation step in the following way:construct_adaptation
uses the sampling steps found byconstruct_sampler
. Now it works!If you want to understand
But what if you not only want it to work, but also to understand AeMCMC's output? As a statistician, I would like to get some textual information about the sampling steps, for instance for a Gibbs sampling kernel:
But
AeMCMC
can also return parametrized sampling steps. If NUTS were assigned, I would like (need) to know:As to not burden the API too much (and not bother those not interested in the details) I suggest to still return the same number of return values for
construct_sampler
:and where
sampler[rv]
still returns the sampling step for the variablerv
. The difference beingsampler
is not a dictionnary but a class:where
kernels
is a list of the kernels that are combined in the sampler. We need the notion of kernel since some algorithms (NUTS) update the values of several variables at once (we could call it sampling_unit as well, which unlike "kernel" is not overused).rvs_to_kernels
maps the RVs the the kernel that updates their values.model_graph
is theFunctionGraph
that was used to build the sampler (that the user can inspect using the tools provided by Aesara/AePPL).By the way, the need to access the graph representation that is used by the samplers means that the transfoms used by NUTS will need to be applied to
RandomVariable
s in AePPL.Representation within the graph / rewrite framework
TODO
This was originally a comment in #68 (comment)
Beta Was this translation helpful? Give feedback.
All reactions