Bayesian model dimensionality #37

williamjameshandley · 2024-03-06T19:41:16Z

Description

Implementation of Bayesian model dimensionality calculations:

lsbi.stats.bmd
lsbi.model.LinearModel.bmd
lsbi.model.LinearModel.mutual_information
lsbi.model.LinearModel.dimensionality

After some effort, I was able to derive the KL divergence, mutual information, Bayesian model dimensionality and average bmd for our general case. This gives accurate and reliable estimates.

@ngm29 may be interested in eqs 10-14 of this cheat sheet.

Feedback on names would probably be helpful here. If we're being consistent with anesthetic we should probably move bmd -> d_G and dkl -> D_KL. Not sure what we should call the mutual information (average DKL over data) or the dimensionality (average d_G over data).

Checklist:

I have performed a self-review of my own code
My code is black compliant (black . --check)
My code is isort compliant (isort . --profile black --filter-files)
My code contains compliant docstrings (pydocstyle --convention=numpy lsbi)
New and existing unit tests pass locally with my changes (python -m pytest)
I have added tests that prove my fix is effective or that my feature works
I have appropriately incremented the semantic version number in both README.rst and lsbi/_version.py

…it generous error bars

codecov · 2024-03-06T20:31:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (2901f25) to head (ba3abce).

Additional details and impacted files

@@            Coverage Diff            @@
##            master       #37   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          546       633   +87     
=========================================
+ Hits           546       633   +87

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yallup

Testing numeric approximations of bmd vs the analytic call:

My suggestions would be we may be starting to need a guide for some of the more esoteric concepts we introduce, this is probably something to be strategized independently of this PR. My comments on this PR are mostly about when we have to resort to mc estimates for bmd, can we get some kind of error on this as it seems a very sensitive quantity?

yallup · 2024-03-14T12:30:45Z

lsbi/model.py

        Parameters
        ----------
        D : array_like, shape (..., d)
            Data to form the posterior
        n : int, optional
            Number of samples for a monte carlo estimate, defaults to 0
        """
-        return dkl(self.posterior(D), self.prior(), n)
+        return bmd(self.posterior(D), self.prior(), N)


Should mcerror be accessible from the model?

yallup · 2024-03-14T12:41:49Z

lsbi/model.py

+        p = self.posterior(D)
+        q = self.prior()
+        x = p.rvs(size=(N, *self.shape[:-1]), broadcast=True)
+        return (p.logpdf(x, broadcast=True) - q.logpdf(x, broadcast=True)).var(axis=0)


is an "mcerror" also possible on these numeric estimates? I guess some kind of resampling of some percentage of N can generate some kind of error but perhaps there is something more principled

yallup · 2024-03-14T12:50:24Z

lsbi/model.py

+        C = self._C
+        return np.broadcast_to(logdet(C + MΣM) / 2 - logdet(C) / 2, self.shape)
+
+    def dimensionality(self, N=0, mcerror=False):


This may need a more informative -- and I would suggest even a more colloquial -- comment (or just a write up of this somewhere!) as it took me a minute to remind myself the bmd averaged over the evidence. Same goes for the mutual information.

yallup · 2024-07-25T09:19:03Z

I would expect the bmd estimate from the two models to be consistent

from lsbi.model import LinearModel, MixtureModel
import numpy as np
from matplotlib import pyplot as plt

d = 100
t = 10
k = 1

rng = np.random.RandomState(0)

model_matrix = rng.normal(size=(t, d))

mixture_model = MixtureModel(
    M=model_matrix[None, ...],
)

linear_model = LinearModel(
    M=model_matrix,
)

true_data = mixture_model.evidence().rvs()

bmds =  [] 
for i in range(100):
    bmds.append(mixture_model.bmd(true_data, N=500))

plt.hist(bmds, density=True, label="Mixture estimate BMD")

plt.vlines(linear_model.bmd(true_data), 0,2, color="black", label = "Analytic BMD")
plt.ylim(0,1.1)
plt.legend()
plt.show()

williamjameshandley added 10 commits March 5, 2024 23:21

implemented Bayesian model dimensionality

6127eb5

Added mutual information and total dimensionality

d28419e

Updated model dimensionality for tests

99b4f46

Tests now passing

9a15e22

added monte carlo error estimates

5bf5fb4

Trying to debug underestimate of dimensionality error

ca4d0d2

Made some notes about bmd error estimation. To naut. Let's just give …

5a82db7

…it generous error bars

Cleaned up stats.py

c091a36

Tests now up to date

d0e083b

bump version to 0.13.0

362b06a

williamjameshandley requested a review from yallup March 6, 2024 19:41

Fixed documentation

ba3abce

yallup reviewed Mar 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bayesian model dimensionality #37

Bayesian model dimensionality #37

williamjameshandley commented Mar 6, 2024

codecov bot commented Mar 6, 2024 •

edited

Loading

yallup left a comment

yallup Mar 14, 2024

yallup Mar 14, 2024

yallup Mar 14, 2024

yallup commented Jul 25, 2024

Bayesian model dimensionality #37

Are you sure you want to change the base?

Bayesian model dimensionality #37

Conversation

williamjameshandley commented Mar 6, 2024

Description

Checklist:

codecov bot commented Mar 6, 2024 • edited Loading

Codecov Report

yallup left a comment

Choose a reason for hiding this comment

yallup Mar 14, 2024

Choose a reason for hiding this comment

yallup Mar 14, 2024

Choose a reason for hiding this comment

yallup Mar 14, 2024

Choose a reason for hiding this comment

yallup commented Jul 25, 2024

codecov bot commented Mar 6, 2024 •

edited

Loading