-
Notifications
You must be signed in to change notification settings - Fork 21
Update distribution shape inference to handle independent dims #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # The arguments to _infer_value_domain are the .output shapes of parameters, | ||
| # so any extra batch dimensions that aren't part of the instance event_shape | ||
| # must be broadcasted output dimensions by construction. | ||
| out_shape = instance.batch_shape + instance.event_shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to _infer_value_domain is the conceptual meat of the PR.
| params, value = params[:-1], params[-1] | ||
| params = params + (Variable("value", value.output),) | ||
| instance = reflect(cls, *params) | ||
| raw_dist, value_name, value_output, dim_to_name = instance._get_raw_dist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to refactor eager_log_prob to use Distribution._get_raw_dist() to get the new tests to pass.
| domains[k] = domain if domain is not None else to_funsor(v).output | ||
|
|
||
| # broadcast individual param domains with Funsor inputs | ||
| # this avoids .expand-ing underlying parameter tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected domain of scale for Normal(Reals[2], 1.) and Normal(Reals[2], torch.ones(2))? Currently, domains["scale"] will be Real in both case. The second case will trigger an error at to_funsor(v, output=domains[k]) below.
In either case, I guess we need to rewrite eager_normal or eager_mvn to address Reals[2] loc. Maybe there is some trick to avoid doing so. cc @fritzo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected domain of scale for
Normal(Reals[2], 1.)andNormal(Reals[2], torch.ones(2))?
In the first case, it's Real, and in the second, it's Reals[2]. I guess I should add a second broadcasting condition below to handle the case where the parameter is a raw tensor:
if ops.is_numeric_array(v): # at this point we know all of v's dims are output dims
domains[k] = Reals[broadcast_shape(v.shape, domains[k].shape)]|
It looks like you just need to relax precision of |
|
@fehiepsi there are a ton of unrelated new failures on the last JAX build, any idea what's going on? Seems like a mix of weird new numerical errors and |
fehiepsi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to get expected outputs looks reasonable to me, pending further comments by @fritzo.
I'll try to address eager_normal, eager_mvn issues in a follow-up PR. Currently, those eager rules assume Real args but we have Reals or a mix of Real, Reals now.
Hmm, I wonder if we'll have to do this for all the other eager distribution patterns in |
fritzo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(thanks for your patience @eb8680)
| funsor_event_shape = funsor_dist.value.output.shape | ||
|
|
||
| # attempt to generically infer the independent output dimensions | ||
| instance = funsor_dist.dist_class(**{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond the scope of this PR, I'm concerned with the increasing overhead of shape computations that need to do tensor ops. I like @fehiepsi's recent suggestion of implementing .forward_event_shape() for transforms. I think it would be worthwhile to discuss and think about extensions to the Distribution interface that could replace all this need to create an throw away dummy distributions.
(Indeed in theory an optimizing compiler could remove all this overhead, but in practice our tensor backends either incur super-linear compile time cost, or fail to cover the wide range of probabilistic models we would like to handle. And while these dummy tensor ops are cheap, they add noise to debugging efforts.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree the repeated creation of distribution instances here is not ideal. Perhaps we could add counterparts of some of the shape inference methods from TFP (e.g. event_shape_tensor, param_shapes) upstream in torch.distributions.
| dist.Independent.has_rsample = property(lambda self: self.base_dist.has_rsample) | ||
| dist.Independent.rsample = dist.Independent.sample | ||
| dist.MaskedDistribution.has_rsample = property(lambda self: self.base_dist.has_rsample) | ||
| dist.MaskedDistribution.rsample = dist.MaskedDistribution.sample | ||
| dist.TransformedDistribution.has_rsample = property(lambda self: self.base_dist.has_rsample) | ||
| dist.TransformedDistribution.rsample = dist.TransformedDistribution.sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO pointing to a NumPyro issue to fix this bug, so we can delete this workaround once the bug is fixed? cc @fehiepsi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neerajprad Should we add those new attributes to NumPyro distributions? We can make default behaviors for them so that there would be only a few changes in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. In numpyro, for distributions that have reparametrized samplers available both will be the same so we can just add a default Distribution.rsample method which delegates to sample and throws a NotImplemented error when not available.
|
@fritzo any other comments? I'd like to bump the Funsor dependency version in Pyro to the latest |
|
Thanks for the reminder and thanks for addressing nits! LGTM. |
Addresses #386, pyro-ppl/pyro#2702
Consider the following snippet of Pyro code simplified from a test in pyro-ppl/pyro#2702 :
Attempting to wrap this in a
poutine.collapse()context fails whendist.Normal(y, 1)is incorrectly converted to a Funsor. In particular, there is a mismatch betweeny.output, which isReals[3]and the.outputvalueRealexpected for thelocparam offunsor.torch.distributions.Normalgiven byfunsor.distribution.Distribution._infer_value_domain.In cases like the above, however, it is always possible to infer the correct parameter and value
.outputshapes generically using the broadcasting logic in the underlying backend distribution. This PR implements this behavior infunsor.distribution.DistributionMeta.__call__anddistribution_to_data.As a result, it is possible after this PR to represent
Independentdistributions without an intermediaryfunsor.Independentby passing aVariablewith extra output dimensions as thevalue:This should considerably simplify pattern-matching over
Independentdistributions.To make this work, converting a
funsor.distribution.Distributionto data viafunsor.to_datawill have to include a step handling nontrivial event shapes by calling.to_eventand unsqueezing parameters.More ambitiously, we could also attempt to handle
to_funsorconversion of backendIndependentdistributions by substituting an appropriately shapedVariablefor their value rather than resorting to lazy application offunsor.terms.Independent, which would make pattern-matching forcollapseing multivariate distributions much easier, but it's not clear yet whether this can be done generically, especially in the presence of transforms. I have not attempted to do this in this PR.Tasks:
funsor.distribution.DistributionMetadistribution_to_dataDirichletMultinomialbroadcasting errors