Add information about sampling steps for the (human & machine) callers #71

rlouf · 2022-10-13T10:38:13Z

rlouf
Oct 13, 2022
Maintainer

Current interface for `construct_sampler`

To understand the kind of changes that having several possible samplers (including parametrized samplers) will require, let’s take a non-trivial example of building sampling functions for the Horseshoe prior, taken from AeMCMC’s test suite:

import aesara.tensor as at
from aemcmc.basic import construct_sampler

srng = at.random.RandomStream(0)

X = at.matrix("X")

# Horseshoe `beta_rv`
tau_rv = srng.halfcauchy(0, 1, name="tau")
lmbda_rv = srng.halfcauchy(0, 1, size=X.shape[1], name="lambda")
beta_rv = srng.normal(0, lmbda_rv * tau_rv, size=X.shape[1], name="beta")

a = at.scalar("a")
b = at.scalar("b")
h_rv = srng.gamma(a, b, name="h")

# Negative-binomial regression
eta = X @ beta_rv
p = at.sigmoid(-eta)
Y_rv = srng.nbinom(h_rv, p, name="Y")

y_vv = Y_rv.clone()
y_vv.name = "y"

We observe Y_rv, and we want to sample from the posterior distribution of tau_rv, lmbda_rv, beta_rv, h_rv. AeMCMC currently provides a construct_sampler function:

sample_steps, updates, initial_values = construct_sampler(srng, {Y_rv: y_vv})

The sample_steps dictionary maps the random variables to the sampling step that was assigned to them. We can print the graph of the sampling step assigned to lambda_rv:

import aesara

print(f"Variables to sample: {sample_steps.keys()}\n")
# Variables to sample: dict_keys([tau, lambda, beta, h])

aesara.dprint(sample_steps[lmbda_rv])
# Elemwise{reciprocal,no_inplace} [id A] 'lambda_posterior'
#  |Elemwise{sqrt,no_inplace} [id B]
#    |exponential_rv{0, (0,), floatX, False}.1 [id C]
#      |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FCCB6F5A0A0>) [id D]
#      |TensorConstant{[]} [id E]
#      |TensorConstant{11} [id F]
#      |Elemwise{add,no_inplace} [id G]
#        |exponential_rv{0, (0,), floatX, False}.1 [id H]
#        | |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FCCB6F589E0>) [id I]
#        | |TensorConstant{[]} [id J]
#        | |TensorConstant{11} [id K]
#        | |Elemwise{add,no_inplace} [id L]
#        |   |InplaceDimShuffle{x} [id M]
#        |   | |TensorConstant{1} [id N]
#        |   |Elemwise{reciprocal,no_inplace} [id O]
#        |     |Elemwise{pow,no_inplace} [id P]
#        |       |lambda [id Q]
#        |       |InplaceDimShuffle{x} [id R]
#        |         |TensorConstant{2} [id S]
#        |Elemwise{true_div,no_inplace} [id T]
#          |Elemwise{mul,no_inplace} [id U]
#          | |Elemwise{mul,no_inplace} [id V]
#          | | |InplaceDimShuffle{x} [id W]
#          | | | |TensorConstant{0.5} [id X]
#          | | |Elemwise{pow,no_inplace} [id Y]
#          | |   |beta [id Z]
#          | |   |InplaceDimShuffle{x} [id BA]
#          | |     |TensorConstant{2} [id BB]
#          | |InplaceDimShuffle{x} [id BC]
#          |   |Elemwise{reciprocal,no_inplace} [id BD]
#          |     |Elemwise{pow,no_inplace} [id BE]
#          |       |tau [id BF]
#          |       |TensorConstant{2} [id BG]
#          |InplaceDimShuffle{x} [id BH]
#            |TensorConstant{1.0} [id BI]

Samplers update rng state and the caller will need to pass these updates to the compiler later, so we return them as well. It consists of a dictionary that contains the updates of the state of the random number generator that we passed via srng:

print(updates)
# {RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FCCB6F59B60>): for{cpu,scan_fn}.1}

And finally we pass the initial value variables of the random variables we wish to sample from:

print(initial_values)
# {tau: tau, lambda: lambda, beta: beta, h: h}

We can now easily build the graph for the sampler:

sample_vars = [tau_rv, lmbda_rv, beta_rv, h_rv]

outputs = [sample_steps[rv] for rv in sample_vars]
inputs = [X, a, b, y_vv] + [initial_values[rv] for rv in sample_vars]

# Don't forget the updates!
sampler = aesara.function(
    inputs,
    outputs,
    updates=updates,
)

And we can run the sampler function in a python loop.

Issues with streams of samplers and parametrized samplers

Although the current interface works perfectly for the Gibbs samplers, the downstream caller has no high-level information about what transformations were applied to the graph, and what samplers were assigned to the variables. They would have to reverse-engineer the information based on the graph that they receive. This becomes problematic the day we return a stream of samplers: how are humans (or machines) to reason about what AeMCMC returns?

Other issues related to information arise with NUTS and parametrized kernels in general:

NUTS works best with unconstrained variables. We thus need to transform the graph; how do we convey this information?
NUTS needs parameters to run. Downstream callers need to know that if they want to write a sampling loop.
NUTS’s parameters need to go through an adaptation mechanism. How do we provide the update functions for these parameters? How do we let the caller know? (This question can be answered independently, I will leave it aside for now)

It is useful to look at this from two perspectives: first from a caller that does not care about the details of the sampler and just wants it to "work", and then from the perspective of a statistician who would like to inspect AeMCMC's returned sampler.

If you just want to sample

We can simply create sampler types. Imagine we pass a complex model to AeMCMC but have no idea what the output sampling steps may be. All we can see is:

sample_steps, updates, initial_values = aemcmc.construct_sampler(srng, {Y_rv: y_vv})

If at least one of the RVs is assigned a parametrized sampler we will run into an issue with the previous workflow:

outputs = [sample_steps[rv] for rv in sample_vars]
inputs = [y_vv] + [initial_values[rv] for rv in sample_vars]

# Don't forget the updates!
sampler = aesara.function(
    inputs,
    outputs,
    updates=updates,
)

Indeed, compilation will fail with an unhelpful error message since the variables representing the parameters are missing. Thus we need to make it explicit that sampling steps might be parametrized. The simplest way to do that is by changing the API slightly and making construct_sampler always returns a parameters variable:

sample_steps, updates, initial_values, parameters = aemcmc.construct_sampler(srng, {Y_rv: y_vv})

But that is not enough: one needs to know how to provide a value for these parameters at the very least. To set the value manually we need to know the type of the parameter and its shape. This information can be passed by setting the type and shape of the TensorVariables when we initialize them.

This is simple for models where random variables are built with concrete shape values, but immediately becomes problematic when shapes are symbolic:

import aesara
import aesara.tensor as at

c = at.scalar('c', dtype='int')
a_rv = at.random.normal(0, 1, size=(c,))
aesara.dprint(a_rv.shape)
# Shape [id A]
#  |normal_rv{0, (0, 0), floatX, False}.1 [id B]
#    |RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F892F8F9540>) [id C]
#    |MakeVector{dtype='int64'} [id D]
#    | |c [id E]
#    |TensorConstant{11} [id F]
#    |TensorConstant{0} [id G]
#    |TensorConstant{1} [id H]

We thus need to provide shape information in a /user-friendly/ way. We can even provide a function that returns the shape based on parameters, or provides an array of ones with this shape given the model parameters:

import aesara.tensor as at

c = at.scalar('c', dtype='int')
a_rv = at.random.normal(0, 1, size=(c,))


fn = aesara.function((c,), a_rv.shape)
fn(3)
# 3

We thus need a parameter type to convey this information. For instance for the inverse mass matrix parameter of the NUTS sampler:

class InverseMassMatrixType(ParameterType):
    var: TensorVariable
    sampler_name: str  # Name of the sampler that needs this value
    
    def __str__(self):
        desc_str = "Inverse mass matrix parameter {sampler_name}"
        dtype_str = f"dtype: {self.var.dtype}"
        shape = self.var.shape
        shape_str = f"shape: {shape}"
        if isinstance(shape, TensorVariable):
            shape_str = ("shape: Depends on the model parameters' values. "
            "Run sample_value({model_parameter_at: model_parameter_val}) "
            "to get an example value with the correct shape."
            )
        return "\n".join([desc_str, dtype_str, shape_str])
            
    def sample_value(self, params_to_values: Dict):
        """Is there a better way to do this?"""
        sample_values = at.ones_like(self.var)
        return sample_value.eval(params_to_values)

What if we don't want to provide values for the parameters and just want it to work? We need to bring in parameters adaptation.

Parameter adaptation

We could provide a build_adaptation_step function that is dispatched on the parameter type, but not only would this requires information about the previous sampler step, in many adaptation schemes it is not possible to decouple the updates of the parameters. The solution thus seems to provide a new high-level function:

# num_steps needs to be provided for some algorithms that need to build a schedule
adaptation_steps, updates, initial_values, num_steps = construct_adaptation(sampler)

where sampler is akin for sample_steps above but with extra information about the kernels that produced the sampling steps. With the current notations you would build an adaptation step in the following way:

outputs = [adaptation_steps[rv] for rv in sample_vars] + [adaptation_steps[param.var] for param in parameters]
inputs = [num_steps, y_vv] + [initial_values[rv] for rv in sample_vars]

sampler = aesara.function(
    inputs,
    outputs,
    updates=updates,
)

construct_adaptation uses the sampling steps found by construct_sampler. Now it works!

If you want to understand

But what if you not only want it to work, but also to understand AeMCMC's output? As a statistician, I would like to get some textual information about the sampling steps, for instance for a Gibbs sampling kernel:

That is is a Gibbs sampling step
The mathematical equations that describe the sampler
The name of the variable updated with this step
The names of the variables we're conditioning on at this step (beta, tau)
What graph was used to build this step?

But AeMCMC can also return parametrized sampling steps. If NUTS were assigned, I would like (need) to know:

That it is a NUTS sampling step
What are the parameters? What do we already know about them (shapes, types)?
What transformations were applied to the random variables?
What other variables are conjointly sampled with NUTS?
What graph was used to build this step?

As to not burden the API too much (and not bother those not interested in the details) I suggest to still return the same number of return values for construct_sampler:

sampler, updates, initial_values, parameters = construct_sampler(srng, {Y_rv: y_vv})

and where sampler[rv] still returns the sampling step for the variable rv. The difference being sampler is not a dictionnary but a class:

@dataclass
class KernelType(object):
    sample_steps: Dict[RandomVariable, Kernel]\
    parameters: List[ParameterType]
    name: str


@dataclass
class Sampler(object):
    rvs_to_kernels: Dict[RandomVariable, Kernel]
    kernels: List[Kernel]
    model_graph: FunctionGraph
    
    def __getitem__(self, rv_name):
        kernel = rvs_to_kernels[rv]
        return kernel.sample_steps[rv]
        
    def __str__(self):
        # Print high-level information about the sampler

where kernels is a list of the kernels that are combined in the sampler. We need the notion of kernel since some algorithms (NUTS) update the values of several variables at once (we could call it sampling_unit as well, which unlike "kernel" is not overused). rvs_to_kernels maps the RVs the the kernel that updates their values. model_graph is the FunctionGraph that was used to build the sampler (that the user can inspect using the tools provided by Aesara/AePPL).

By the way, the need to access the graph representation that is used by the samplers means that the transfoms used by NUTS will need to be applied to RandomVariables in AePPL.

Representation within the graph / rewrite framework

TODO

This was originally a comment in #68 (comment)

brandonwillard · 2022-10-13T20:45:34Z

brandonwillard
Oct 13, 2022
Maintainer

This all sounds good. We definitely need to clarify which objects we're dealing with and how we need to manipulate and model them, and sampler and kernel types sound like a suitable set of abstractions.

As always, we need to consider how we could represent such things within the graph and rewrite frameworks, when possible/relevant. When we can, we're generally able to do more.

3 replies

rlouf Oct 14, 2022
Maintainer Author

Would OpFromGraph help with that? It would be very convenient for this kind of application to be able to define "regions" in the graph that the rewrite system can reason about.

brandonwillard Oct 15, 2022
Maintainer

Would OpFromGraph help with that? It would be very convenient for this kind of application to be able to define "regions" in the graph that the rewrite system can reason about.

Yes; in general, we might want to start looking into better ways to track substructures like this through rewrites, but OpFromGraph is probably a good starting point regardless.

rlouf Nov 6, 2022
Maintainer Author

Per this comment aesara-devs/aeppl#196 (comment) we should also explore the use of FunctionGraph to encapsulate transition kernels.

That is if FunctionGraph doesn't clone its inputs/outputs as we would otherwise risk to lose the dependency between kernels.

Which makes me think that we should have a mechanism that allows us to change the scan order whenever construct_sampler returns several steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add information about sampling steps for the (human & machine) callers #71

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Add information about sampling steps for the (human & machine) callers #71

rlouf Oct 13, 2022 Maintainer

Current interface for construct_sampler

Issues with streams of samplers and parametrized samplers

If you just want to sample

Parameter adaptation

If you want to understand

Representation within the graph / rewrite framework

Replies: 1 comment · 3 replies

brandonwillard Oct 13, 2022 Maintainer

rlouf Oct 14, 2022 Maintainer Author

brandonwillard Oct 15, 2022 Maintainer

rlouf Nov 6, 2022 Maintainer Author

rlouf
Oct 13, 2022
Maintainer

Current interface for `construct_sampler`

Replies: 1 comment 3 replies

brandonwillard
Oct 13, 2022
Maintainer

rlouf Oct 14, 2022
Maintainer Author

brandonwillard Oct 15, 2022
Maintainer

rlouf Nov 6, 2022
Maintainer Author