forked from fkrauer/BayesFitR_Halle
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtutorial_BT.Rmd
637 lines (417 loc) · 24.2 KB
/
tutorial_BT.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
---
title: "Bayesian workflow for fitting SEIR models using the BayesianTools package"
author: "Fabienne Krauer"
date: "2022-09-03"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Intro
### The Bayesian workflow
The [Bayesian workflow](https://arxiv.org/abs/2011.01808) was proposed by Andrew Gelman, Aki Vehtari and others, and describes the process of model building, inference and model checking/improvement. The Bayesian workflow is not a straight line but an iterative process of checking and refining or discarding components of the model.
Why should I follow the workflow? Bayesian inference for IDD models is hard, and most models are too complex to be fitted without any problems. The workflow helps to narrow down where the problem is and may help to improve inference.
The process can be summarised as follows (for a graphical representation, see Gelman's publication):
1. Define a model
2. Define the prior and perform a prior predictive check, modify the prior if necessary
3. Fit the model
4. Validate the computation (diagnostics)
5. Address potential issues with computation (MCMC)
6. Evaluate the model fit: posterior predictive intervals
7. Modify the model and/or the prior if necessary, or add/change the data
8. Compare models
We will go through some of these steps with an easy example below.
### The BayesianTools package
[BayesianTools](https://cran.r-project.org/web/packages/BayesianTools/BayesianTools.pdf) is a general purpose R package, which offers various MCMC and SMC samplers, as well as plot and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. BT is a modular package, where the user defines the likelihood function and the priors separately. It does not require the use of domain-specific language (such as Stan).
Samplers:
1) standard Metropolis-Hastings (MH)
2) adaptive Metropolis-Hastings
3) standard or adaptive Metropolis-Hastings with delayed rejection
4) standard Metropolis-Hastings with Gibbs updating
5) standard Metropolis-Hastings with Gibbs updating and tempering
6) Differential Evolution (DE), optional with snooker update (DEzs)
7) Differential Evolution Adaptive Metropolis (DREAM)
8) t-walk MCMC
9) Sequential Monte Carlo (SMC)
Delayed rejection sampling means that a second (or third, etc.) proposal is made before rejection. This second proposal is drawn from a different distribution, allowing for a greater flexibility of the sampler.
Gibbs sampling means that in each iteration, only a subset of the parameters is updated.
Tempering is similar to simulated annealing, and means that the acceptance rate is increased during the burnin phase. This should help to speed up convergence.
Differential Evolution algorithms are genetic algorithms with multiple chains, which inform each other during the burnin phase. If in doubt which one to choose, prefer "DEzs". The DE sampler uses the current position of two other chains to generate the proposal for each chain. DEzs also uses past states of the two other chains in the creation of the proposal. This allows for fewer chains (i.e. 2x3 chains are usually enough for up to 200 parameters).
## WORKFLOW
First we load the required libraries and set a seed:
```{r}
library(deSolve)
library(tidyverse)
library(BayesianTools)
library(coda)
set.seed(42)
```
### Model
We start by defining the model. This is a simple SEIR model without demographic changes and permanent immunity:
```{r model_SEIRS}
model_SEIR <- function(times, inits, theta) {
SEIR <- function(times, inits, theta) {
S = inits[["S"]]
E = inits[["E"]]
I = inits[["I"]]
R = inits[["R"]]
N = S + E + I + R
beta <- theta[["beta"]]
sigma <- theta[["sigma"]]
gamma <- theta[["gamma"]]
dS <- -beta*S*I/N
dE <- beta*S*I/N - sigma*E
dI <- sigma*E - gamma*I
dR <- gamma*I
dC <- beta*S*I/N
list(c(dS, dE, dI, dR, dC))
}
traj <- data.frame(lsoda(inits, times, SEIR, theta))
# Calculate the incidence per time step from the cumulative state:
traj$inc <- c(inits["I"], diff(traj$C))
return(traj)
}
```
Next, we define the parameters and the initial conditions for the model. The initial conditions assume that everyone is susceptible, except one person, which is the first infectious individual:
```{r params}
beta <- 0.55 # transmission rate
sigma <- 1/5 # =1/duration of latency
gamma <- 1/5 # = 1/duration of infectiousness
theta <- c(beta=beta, sigma=sigma, gamma=gamma) # combine in one vector "theta"
inits <- c("S"=10000-1, "E"=0, "I"=1, "R"=0, "C"=1)
```
We the define the time points at which the ODE solution is saved. Note, the unit of the time steps MUST BE the same as the unit of your parameters, i.e. if $\gamma$ = 1/5 days, then one time step (e.g. 1 to 2) is 1 day.
Also remember that integration is computationally costly, so save only the time steps you need.
```{r}
times <- seq(1:100)
```
Run model once and plot the trajectory:
```{r}
traj <- model_SEIR(times, inits, theta)
```
This is a closed population model. It is always a good idea to do a "sanity check": the sum of all states must be constant and equivalent to the total number of individuals. If it increases or decreases, there is probably a mistake in your model code.
```{r}
summary(rowSums(traj[,c(2:5)]))
```
```{r, echo=FALSE}
ggplot(traj) +
geom_line(aes(x=time, y=inc))
```
### Data
We now generate some data from the model ("synthetic data") for the fitting process. The advantage of simulated data is that you know the "true" parameter values that generated the data, which makes learning about inference easier.
Since case data are commonly count data, we here assume that the observation process is a Poisson process:
```{r data}
data <- data.frame(obs=sapply(traj$inc, function(x) rpois(1, x)),
time = times)
```
and plot the generated data and the modelled trajectory:
```{r}
ggplot() +
geom_line(data=traj, aes(x=time, y=inc)) +
geom_point(data=data, aes(x=time, y=obs))
```
### Priors
We now define the prior distributions. For convencience, we define the lower and upper limits for ALL parameter, even if we don't fit them all initially. You should carefully think about the bounds of each parameter, and whether they make sense from a mechanistic/biological point of view.
```{r}
# Estimated params
estpars <- c("beta", "sigma") # parameters to estimate, can be modified
index <- which(names(theta) %in% estpars) # index of estimated params
# Priors
lower = c("beta"=0, "sigma"=0)
upper = c("beta"=0.5, "sigma"=0.1)
```
We then create a prior object. BT has a function to `createUniformPrior()` to generate uniform priors for all parameters. It is a good starting point, but you often need to define other distributions for some priors.
```{r}
# Create uniform prior
prior_unif <- createUniformPrior(lower=lower,
upper=upper)
```
### Prior predictive check
Before we do any fitting, it is often useful to check whether the prior distributions you have chosen actually generate a model trajectory that overlaps with the data. If your prior is poorly specified, getting a posterior is not possible. We can do this by sampling from the prior and forward simulating the model trajectory with the sampled values. This is called the prior predictive check. The following function does this (cave: it's slow and not optimised) and summarises the min and max trajectory values for each time step in the model. (You can also choose to sample `prob = c(0.025, 0.975)`, which will result in a 95% interval). The observation process (here a Poisson) is also included to account for the observation uncertainty:
```{r}
sample_prior_Pois <- function(prior, model, times, inits,
theta, index, estpars, nsims) {
traj <- vector("list", nsims)
for (i in 1:nsims) {
theta_est <- prior$sampler() # sample from the prior
names(theta_est) <- estpars
theta_updated <- c(theta[-index], theta_est)
inc <- match.fun(model)(times, inits, theta_updated)[,c("inc")]
inc_obs <- sapply(inc, function(x) rpois(1, x))
traj[[i]] <- inc_obs
}
trajdf <- t(plyr::ldply(traj, rbind))
quantiles <- data.frame(t(apply(trajdf, 1, quantile, prob = c(0.0, 1.0))))
quantiles$time <- times
colnames(quantiles)[1:2] <- c("min", "max")
return(quantiles)
}
```
Let's sample and plot:
```{r}
priorpred <- sample_prior_Pois(prior_unif, model_SEIR, times, inits,
theta, index, estpars, nsims=500)
ggplot() +
geom_point(data=data, aes(x=time, y=obs)) +
geom_ribbon(data=priorpred, aes(x=time, ymin=min, ymax=max), alpha=0.5, fill="red")
```
The prior predictive does not include all the data points, which is no surprise given that the "true" parameter values are outside the prior range.
Let's update the prior ranges and redo the prior predictive:
```{r}
# Priors
lower = c("beta"=0, "sigma"=0)
upper = c("beta"=5.0, "sigma"=1.0)
prior_unif <- createUniformPrior(lower=lower,
upper=upper)
priorpred <- sample_prior_Pois(prior_unif, model_SEIR, times, inits,
theta, index, estpars, nsims=500)
ggplot() +
geom_point(data=data, aes(x=time, y=obs)) +
geom_ribbon(data=priorpred, aes(x=time, ymin=min, ymax=max), alpha=0.5, fill="red")
```
This time the data range is covered in the prior predictive, meaning that the prior boundaries are well chosen. The prior predictive also reveals that a lot of parameter samples will lead to trajectories far away from the data. This can potentially make the inference process more challenging if you have a large number of parameters to estimate. You may have to adjust the prior in the process of the fitting, to facilitate the sampling.
### Likelihood function
The log-likelihood function is the core of Bayesian inference. Again, we are assuming the observations arise from a Poisson process (count data). In BT, you need to define a function that takes the parameters and returns a single log-likelihood value:
```{r}
ll_Pois <- function(model, theta, inits, times, data) {
traj <- match.fun(model)(times, inits, theta)
datapoint <- data$obs
modelpoint <- traj$inc
if (any(is.na(modelpoint))) {
ll <- -Inf
}
else { # Minimize the log likelihood
ll <- sum(dpois(x=datapoint,
lambda=modelpoint,
log=TRUE), na.rm=TRUE)
}
return(ll)
}
```
You should always test the LL function:
```{r}
# test
ll_Pois(model_SEIR, theta, inits, times, data)
```
We also have to define a wrapper function for the log-likelihood function, because BT requires that the LL-function takes only one argument, which are the estimated parameters. Remember to adjust this function when you change the names of the arguments of the likelihood function:
```{r}
ll_Pois_wrapper <- function(par) {
parX = theta
parX[index] = par
return(ll_Pois(model=model_SEIR,
theta=parX,
inits=inits,
times=times,
data=data))
}
# Test (should return the same as ll_pois())
ll_Pois_wrapper(theta[index])
```
### Set up and run the MCMC
Finally, we define the setup for the sampler and run the MCMC. We here choose the simplest sampler (standard Metropolis Hastings), and run 3 chains with 20_000 iterations each. It is not always easy to determine the number of iterations to run, the best thing to do is to start with something like 20_000 iterations (for random walk samplers only. For gradient-based samplers like HMC/NUTS you can start with 1000). We also want to run more than one chain to determine convergence.
Note that I have suppressed the messages to the console for this tutorial with `message = FALSE`, but you should have message enabled (the default) to monitor the progress of the fitting.
```{r}
mcmc_settings1 <- list(iterations = 20000,
nrChains = 3,
message = FALSE,
optimize=FALSE,
adapt=FALSE)
bayesianSetup_unif <- createBayesianSetup(prior = prior_unif,
likelihood = ll_Pois_wrapper,
names = names(theta[index]),
parallel = FALSE)
system.time({chain_unif1 <- runMCMC(bayesianSetup = bayesianSetup_unif,
sampler = "Metropolis",
settings = mcmc_settings1)})
```
### Diagnostics
The first thing you do is to visually inspect the trace plots:
```{r}
plot(chain_unif1)
```
The chains seem to have converged, but the mixing is very poor (no caterpillar), the chains stick too long in one place suggesting that the proposal is poorly chosen.
The summary also shows that the rejection rate is very high:
```{r}
summary(chain_unif1)
```
We will thus try a adaptive MCMC, which alters the proposal distribution during the fit, and also extend the iterations:
```{r}
mcmc_settings2 <- list(iterations = 50000,
nrChains = 3,
message = FALSE,
optimize=FALSE,
adapt=TRUE)
system.time({chain_unif2 <- runMCMC(bayesianSetup = bayesianSetup_unif,
sampler = "Metropolis",
settings = mcmc_settings2)})
```
```{r}
plot(chain_unif2)
```
Including the adaptation of the proposal has improved the mixing substantially. We remove the first 10_000 iterations (burn-in phase):
```{r}
nburn <- 15000
plot(chain_unif2, parametersOnly = TRUE, start =nburn)
```
```{r}
summary(chain_unif2)
```
The rejection rate is much better and the effective sample size (ESS) has also increased. The ESS is an estimate for the number of independent samples (taking into account autocorrelation of the individual samples) generated by the MCMC run. There is no universal threshold for how large the ESS should be, but you should aim for at least 100 independent samples.
We then check convergence mathematically with the PSRF. The PSRF gives the scale reduction factor for each parameter. A PSRF of 1 means that variance between and within the chains are equal, which is the goal. As a rule of thumb, a if PSRF < 1.05, convergence is very likely. But you should always also check the traceplots
```{r}
gelmanDiagnostics(chain_unif2, plot=FALSE, start=nburn)
```
It is also advised to check parameter correlations. Sometimes these correlations can make inference more challenging, i.e. the posterior is too complex with several local modes. In that case, you may need to re-formulate (reparametrize) the model or run the chains for longer. However, ODE models are difficult to reparametrize.
Depending on what the purpose of the model inference is, parameter correlation is a larger or smaller problem. If we are not too concerned about the mechanistic meaning of the parameters, we may accept some degree of parameter correlation. If we are intersted in learning about the mechanisms underlying the disease data, we may need to add additional data for the fitting and/or change the priors to narrow down the parameter space and reduce correlation.
```{r}
correlationPlot(chain_unif2, start=nburn)
```
In this example, the parameters are strongly correlated, but
1) We still have convergence of the chains
2) We know the posteriors are correct
3) We cannot reparametrize this model.
Thus, we will leave the model as it is and accept the parameter correlation. However, in a real life study, we would want to reduce this correlation.
We can also check the summary statistics of the marginal posteriors. For this, we need to convert the chain to a coda object and summarize. For a description of the marginal posteriors, we commonly use the median and the equal tailed 95% CrI (credible interval), which is the 2.5% and 97.5% percentile:
```{r}
coda2 <- getSample(chain_unif2, parametersOnly = TRUE, coda=TRUE, start=nburn)
summary(coda2)
```
Alternatively, you can choose the highest posterior density interval. We have to combine the chains to get an overall estimate:
```{r}
HPDinterval(mcmc(do.call(rbind, coda2)))
```
## Evaluating the model fit
If the chains are healthy and converged, and the marginal posterior distributions are reasonable, we can check the fit to the data by evaluating the posterior predictive. The posterior predictive distribution is generated by taking random draws from the posterior and forward simulating the model trajectory. The simulated trajectories are then summarised over each time step in the model by calculating the percentiles, e.g. 2.5th and 97.5th percentile. The resulting uncertainty is called the (95%) posterior predictive interval.
The following function calculates the quantiles from a given model and posterior:
```{r}
sample_posterior_Pois <- function(chain, theta, inits, times, model, ndraw, nburn, progress) {
#Draw n fitted parameter vectors theta from the MCMC object
sample <- getSample(chain, parametersOnly = TRUE, thin=1, numSamples=ndraw, start=nburn)
fit <- plyr::adply(.data=sample, .margins=1, .progress=progress, .parallel=F, .fun=function(x) {
#Define the theta as the estimated parameters from the current draw and the fixed parameters
theta_sample <- c(x, theta[!is.na(theta)])
#Simulate trajectory for selected theta
foo <- match.fun(model)(times, inits, theta_sample)
foo$simobs <- sapply(foo$inc, function(y) rpois(n=1, lambda=y))
return(foo)
})
colnames(fit)[1] <- "replicate"
quantiles <- plyr::ddply(.data=fit,
.variables="time",
function(x) quantile(x[, which(colnames(fit)=="simobs")], prob = c(0.025, 0.5, 0.975), na.rm=T))
colnames(quantiles) <- c("time", "low95PPI", "median", "up95PPI")
return(quantiles)
}
```
Check the fit. Note that I use `progress="none"` to suppress the messages in the markdown file, but you should always monitor the progress with `progress="text"`
```{r}
fit_quantiles <- sample_posterior_Pois(chain_unif2, theta, inits, times, model_SEIR, ndraw=500, nburn=nburn, progress="none")
# Plot fit
ggplot() +
geom_point(data=data, aes(x=time, y=obs)) +
geom_line(data=fit_quantiles, aes(x=time, y=median), color="red") +
geom_ribbon(data=fit_quantiles, aes(x=time, ymin=low95PPI, ymax=up95PPI), alpha=0.2, fill="red")
```
### Final assessment
1. Does the trajectory fit look good?
2. Is the chain mixing healthy?
3. Have the chains converged?
4. Are the marginal posteriors are biologically meaningful, unimodal and informative (not too wide)?
In this example, we would conclude that the requirements are met and that we would keep the final model and chains. If the fit were poor, we would need to reformulate the model and/or the prior distributions, and/or add more data for fitting, or choose a different MCMC algorithm. More complex models are often difficult to fit with a Metropolis algorithm. Genetic algorithms, parallel tempering or gradient-based methods such as HMC/NUTS are more appropriate in these cases.
If the chains do not converge, it is often a problem of a complex posterior geometry due to a complex log likelihood surface. Tightening the priors can remove some of these issues, but it is useful to first investigate the log likelihood surface of the parameters of interest to understand how the priors should be adjusted. Strictly speaking, the prior should be independent of the data (i.e. the log likelihood), but in reality you sometimes need to adjust the prior to get convergence at all for ODE models.
## Advanced usage
### Custom priors
Let's assume we would have to modify the priors to make them more informative. In BT, if you want a custom prior, you have to write out a designated density and sampler function. We will choose a truncated normal NT(0.5, 0.1) distribution for $\beta$, and a Beta(2,5) distribution for $\sigma$. Check [here](https://homepage.divms.uiowa.edu/~mbognar/applets/normal.html) and [here](https://homepage.divms.uiowa.edu/~mbognar/applets/beta.html and) to understand what these priors look like.
First we define the density function, which returns the joint (summed) values of the density functions of the sampled parameters:
```{r}
density <- function(par) {
return(
msm::dtnorm(par[1], # Beta
mean = 0.5,
sd = 0.1,
lower = lower[["beta"]],
upper = upper[["beta"]],
log = TRUE) +
dbeta(par[2], # Sigma
shape1 = 2,
shape2 = 5,
log = TRUE)
)
}
```
The sampler function draws one sample from the defined prior distributions. Cave: ensure that the order of the parameters in the density function corresponds to the order in `theta[index]`:
```{r}
sampler <- function(n=1){
return(cbind(
msm::rtnorm(n, # Beta
mean = 0.5,
sd = 0.1,
lower = lower[["beta"]],
upper = upper[["beta"]]),
rbeta(n, # Sigma
shape1 = 2,
shape2 = 5)
))
}
```
We then define a new prior object:
```{r}
prior_custom <- createPrior(density=density,
sampler=sampler,
lower=lower,
upper=upper)
```
Now update the Bayesian setup and fit again:
```{r}
bayesianSetup_custom <- createBayesianSetup(prior = prior_custom,
likelihood = ll_Pois_wrapper,
names = names(theta[index]),
parallel = FALSE)
system.time({chain_custom <- runMCMC(bayesianSetup = bayesianSetup_custom,
sampler = "Metropolis",
settings = mcmc_settings2)})
# Diagnostics
plot(chain_custom)
```
### Model comparison
We can calculate the Bayes Factor of two models and compare them to understand, which model is a better explanation of the data. Let's compare here the model with the uniform prior to the model with the custom prior.
The simplest approach is to compare the DIC of both models. The best model is the one which minimises the DIC (the smaller the better). Note that in real life we probably wouldn't compare two identical models which differ in their prior, but rather two different models.
```{r}
DIC(chain_unif2, start=nburn)$DIC
DIC(chain_custom, start=nburn)$DIC
```
We can also calculate the Bayes Factor, which is the ratio of the posteriors:
A BF of < 1 supports the alternative model M1.
A BF of > 1 supports original model M0
```{r}
M0 <- marginalLikelihood(chain_unif2, start=nburn)
M1 <- marginalLikelihood(chain_custom, start=nburn)
exp(M0$ln.ML - M1$ln.ML)
```
### Parallel sampling
BT has an option to run the chains in parallel (if multiple chains are fitted). However, it is not always much faster than the non-parallel version. This option can be setup as follows:
```{r, eval=FALSE}
# You first have to check the maximum number of cpus you can use on your computer.
library(parallel)
detectCores()
# Define how many cores you want. Use one core per chain. For DEzs, use one core per chain-bundle (=3 chains)
cpus <- 2
# Options for parallel computing
opts <- list(packages=list("BayesianTools", "deSolve"),
variables=list("theta", "data", "index", "times", "model", "inits","loglik"),
dlls=NULL)
bayesianSetup <- createBayesianSetup(prior = prior,
likelihood = ll_wrapper,
names = names(theta[index]),
parallel = cpus,
parallelOptions = opts)
system.time({chain <- runMCMC(bayesianSetup = bayesianSetup,
sampler = sampler,
settings = mcmc_settings)})
stopParallel(bayesianSetup)
```
### Chain extension
You also have the option to extend a chain that has not converged yet, without starting from scratch. To do this, you can re-feed the chain object into the `runMCMC()` function. Note that this does not work if the original chain was run in parallel.
```{r, eval=FALSE}
chain_ext <- runMCMC(bayesianSetup = chain, sampler = sampler_algo, settings = mcmc.settings)
```