Applied-Stats-Sims/Multilevel_Factorial.Rmd at master · atamalu/Applied-Stats-Sims · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
---
title: "Multilevel Model (factorial)"
output:
  github_document:
    html_preview: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
                      fig.path = "Figures/")
options(digits = 3,
        signif = 3)
```

## Study

A behavioral neuroscience lab that studies learned helplessness is testing a new technology that measures calcium activity in the brain. One previous study tested the effect of exercise on activity in the medial prefrontal cortex (mPFC) in response to an uncontrollable stressor. However, this study only included male subjects. The present lab has previously discovered that neural activity in the mPFC resulting from stressful situations is different between male and female rats. One researcher decides to test if exercise will alter aggregate calcium levels in this region and affect how the mPFC reacts to uncontrollable stress.

For 6 weeks beforehand, an experimental group (exercise) is given a wheel to run on in their cage. A control group (sedentary) is also given a wheel to run on for this time period, but it is locked and therefore non-functional. The researcher also includes subjects of both sexes. To connect behavior and neural activity during the experiment, calcium activity is continuously recorded before and through 30 trials of inescapable shock (IS); in which the subject does not have behavioral control over the stressor.

The data is first centered at a pre-trial baseline period, then aggregated to remove the dimension of time within trials. This is accomplished by using the area under the curve (AUC) to compact time and aggregate calcium levels into one measurement. This use of the AUC metric is representing aggregate levels of calcium relative to the pre-trial baseline. To be specific, this makes the interpretation of AUCs: "mPFC activity over the course of shock relative to pre-trial baseline," which essentially makes it a measure of how uncontrollable stress changes neural activity.

## Why this model?

The reasoning behind using a multilevel model is pretty straightforward in this case:

* Due to individual differences in stress resilience and overall neurobiology, we at least assume that we should give each subject their own intercept
* Also because of individual differences in stress resilience, the rate at which the DV changes across trials in response to stress may differ from subject to subject (assuming that it does change across trials). So we see if allowing a random slope for Trial will significantly improve the model

<details><summary>(details)</summary>
<p>
Multilevel models (aka hierarchical or mixed-effect models) are becoming increasingly popular for both hypothesis testing and predictive purposes. This is in-part because it allows us make inferences on the individual level. Furhter, unlike a standard repeated-measures ANOVA, multilevel models permit a large number of repetitions without blocking trials and losing statistical power, in addition to a large amount of potentially useful information.

These models can capitalize on experimental designs with many repeated observations for each subject. By telling the model that each observation pertains to a certain individual, we can make stronger inferences than standard ANOVAs by also examining how predictor variables affect <i>individuals</i> in a group. This is extremely helpful in fields where you do not have the luxury of using many subjects or participants, but the experiment gives many data points for each subject.

For example: instead of simply saying that a grouping variable explains 32% of variance in the dependent variable (DV), we can also compute the amount of variance that the grouping variable accounts for on the individual level. We can then use this information to strengthen our inferences about the experimental variable(s). So if overall between-subject differences account for 45% of the variance here, we can also assume that around 3/4 of individual differences in the DV can be explained by that grouping variable.
</p></details>

## Setup

### Packages/functions

```{r, message=FALSE}
library(kableExtra) # table options
library(dplyr) # data manipulation
library(psych) # some analytics
library(ggplot2) # visualization
library(ggpubr) # good theme
library(lme4) # mixed model
library(lmerTest) # p-values for lme4
library(optimx) # optimizer
library(performance) # effect sizes

set.seed(150) # for data simulation replicability
theme_set(theme_pubr() %+replace%
            theme(axis.ticks = element_blank(),
                  axis.title = element_text(face = 'bold'))) # theme option overrides

lmer.opts <- lmerControl(optimizer = 'optimx',
                         optCtrl=list(
                           method = 'L-BFGS-B')) # for convergence

##### Functions ---------------
source('Multilevel_Factorial/Functions multilevel 1.R')
```

### Data simulation

We generate our data using approximate means and standard deviations from the actual experiment.

```{r}
##### Summary stats from small n study ---------------
df <- data.frame(
  Group = c('Sedentary', 'Sedentary', 'Exercise', 'Exercise'),
  Sex = c('Male', 'Female', 'Male', 'Female'),
  Mean = c(0.24, 4.27, -1.6, 4.25),
  Sd = c(1.12, 1.61, 2.9, 1.13)
)

##### Simulate data ---------------
multi.sims <- mapply(function(m, s){
  sim.multi(n.obs = 12,
            nvar = 1, nfact = 1,
            ntrials = 30, days = 1,
            mu = m, sigma = s,
            plot = FALSE)
}, m = df$Mean, s = df$Sd)

names(multi.sims) <- rep(c('IV', 'AUC', 'Trial', 'Subject.num'), 4)

##### Format data ---------------
group.names <- paste(df$Group, df$Sex) # add group names

### Move into separate data frames and label

sed.male <- do.call(cbind, multi.sims[2:4]) %>%
  as.data.frame() %>%
  mutate(Group = 'Sedentary', Sex = 'Male')

sed.fem <- do.call(cbind, multi.sims[6:8]) %>%
  as.data.frame() %>%
  mutate(Group = 'Sedentary', Sex = 'Female')

ex.male <- do.call(cbind, multi.sims[10:12]) %>%
  as.data.frame() %>%
  mutate(Group = 'Exercise', Sex = 'Male')

ex.fem <- do.call(cbind, multi.sims[14:16]) %>%
  as.data.frame() %>%
  mutate(Group = 'Exercise', Sex = 'Female')

### Combine into final data frame
df <- rbind(sed.male, sed.fem, ex.male, ex.fem)

### Make trial into 1-30
df <- df %>%
  group_by(Group, Sex, Subject.num) %>%
  arrange(Trial) %>%
  mutate(Trial = 1:n())

glimpse(df)
```

Now we have our data set. Each combination of group and sex (IV's) has 10 subjects; with an AUC (DV) value for each of the 30 trials.

## Data visualization

We start by making a summary frame grouped by Group and Sex from our simulated data.

<details><summary>(details)</summary>
<p>
Bar graphs are a very basic and ubiquitous way of displaying categorical predictors with a continuous response. Here, the bars represent the means of the factorial combination of grouping variables. Error bars featuring the standard error (SE) are often added to these bars to give an idea of how well the sample means from our study represent the population; with longer error bars indicating less certainty.
</p></details>
<BR>

```{r}
##### Summarize ---------------
df.summ1 <- df %>%
  group_by(Group, Sex) %>%
  summarize(Mean = mean(AUC), Se = se(AUC)) %>%
  mutate(Upper = Mean + Se,
         Lower = Mean - Se)

### re-level
df.summ1$Group <- factor(df.summ1$Group, levels = c('Sedentary', 'Exercise'))
df.summ1$Sex <- factor(df.summ1$Sex, levels = c('Female', 'Male'))

##### Graph ---------------
ggplot(df.summ1) +
  geom_bar(aes(x = Sex, y = Mean,
               fill = Group, group = Group),
           stat = 'identity', position = position_dodge(width = 1)) +
  geom_errorbar(aes(x = Sex,
                    ymin = Lower, ymax = Upper,
                    group = Group),
                position = position_dodge(width = 1), width = 0.2) +
  labs(x = 'Sex',
       y = 'AUC',
       title = 'Average Calcium Levels during Shock',
       subtitle = 'with standard errors')
```

There is clearly a difference in averages between males and females. The graph suggests that there may be an effect of exercise, but it is relatively small if it is.

Next, we move to the between-subject individual level by making a summary data frame from our original data. This data frame will be grouped by Group, Sex, and Subject number.

<details><summary>(details)</summary>
<p>
The graph itself - a box plot - shows the minimum, 25th percentile, median, 75th percentile, and maximum. In a broader sense, the box represents individuals within the middle 50% of values by group. The lines extending from the box represent the extremes. Adding data points allows us to look at how average DV values for each individual are distributed.
</p></details>
<BR>

```{r}
grps.ordered <- c('Sedentary Female', 'Exercise Female', 'Sedentary Male', 'Exercise Male')

##### Summarize ---------------
df.summ2 <- df %>%
  group_by(Group, Sex, Subject.num) %>%
  summarize(Mean = mean(AUC)) %>%
  mutate(Group2 = paste(Group, Sex)) # new grouping

### fix new grouping order
df.summ2$Group2 <- factor(df.summ2$Group2, levels=grps.ordered)

##### Graph ---------------
ggplot(df.summ2) +
  geom_boxplot(aes(x = Group2, y = Mean)) +
  geom_point(aes(x = Group2, y = Mean)) +
  labs(x = 'Sex', y = 'AUC',
       title = '')
```

We then work our way down to the within-subject individual level by using the original data and plotting subjects across all trials. In practice, we want to look at all of the subjects. But for this example, we will look at the first 4 subjects in each group to avoid an overwhelming amount of visuals.

<details><summary>(details)</summary>
<p>
We can use facets to conveniently view subjects along all trials. Viewing them in this manner helps find any trends that may occur across trials within subjects, or any abrupt increases or decreases in calcium levels. In the event of a trend, we can then easily see if that trend differs between groups. While the change across trials is not one of the main hypotheses, individual differences in this trend is something that could affect the results.
</p></details>
<BR>

```{r}
##### New labels ---------------
df$Group2 <- paste(df$Group, df$Sex)
df.sub <- df[df$Subject.num %in% c(1:4),]

df.sub$Group2 <- factor(df.sub$Group2, levels=grps.ordered)

##### Graph ---------------
ggplot(df.sub) +
  geom_line(aes(x = Trial, y = AUC)) +
  labs(x = 'Trial', y = 'AUC',
       title = '') +
  facet_grid(Subject.num ~ Group2) +
  scale_x_continuous(breaks = c(10, 20, 30))
```

There is no specific trend across trials that is noticeable in any group. This is not necessarily surprising; as the manipulation does not involve any learning effect or mid-trial alteration in any experimental manner.

## Pre-model

The last step of preparation is constructing the contrast codes. Specifically, we want to know:

* Is there an effect of sex on AUC? <BR>
* Is there an effect of exercise on AUC? <BR>
* Does the effect of exercise depend on the subject's sex? <BR>

```{r}
df$group.con <- ifelse(df$Group == 'Sedentary', -0.5, 0.5)
df$sex.con <- ifelse(df$Sex == 'Male', -0.5, 0.5)
```

## Model

### OMNIBUS

To construct our model, we begin at a random intercept model by including the following: <BR>

* 1 DV... `AUC` <BR>
* 2 IV's... `Group + Sex` <BR>
* A Group-Sex interaction... `Group:Sex` <BR>
* A random intercept for each subject `(1 | Subject)` <BR> <BR>

We also use ML instead of REML for estimation to measure the effects of our individual parameters by using model comparison.

```{r}
### Unique subject ids for random effects specification
df$Subject <- paste0('S', rep(1:48, 30))

### Model
mod1 <- lmer(AUC ~ (1 | Subject) + group.con + sex.con + group.con:sex.con, data = df, REML = FALSE, control = lmer.opts) # ML for model comparisons
print(summary(mod1))
```

Next, we check to see if our model is significantly improved by allowing the change in AUC across trials from uncontrollable stress to be different for each subject. We expect that it does not because of our visual inspection above.

```{r}
mod2 <- lmer(AUC ~ (1 + Trial|Subject) + group.con + sex.con + group.con:sex.con, data = df, REML = FALSE, control = lmer.opts)
```

We receive a message that tells us our fit is singular. This is the result of either the variance captured by a random effect being close to 0, or a correlation of +/- 1. A look at the random effects confirms that the added slope of Trial is our culprit:

```{r}
summary(mod2)$varcor
```

So we leave the OMNIBUS model without a slope that can vary within subjects across trials. Keep in mind that this still considers all 30 observations from the same subject, it just does not consider the within-subject changes <i>across</i> trials.

We are now ready to move to the main analysis portion of the modeling process. To perform comparisons, we create 4 models; each one (minus the most complex model) excluding a variable of interest.

## Model comparisons

Now that we have our models, we can compare them by using F-tests. With these, we can measure the effect of each parameter separate of the others. We start with the random effects.

```{r}
##### No interaction model ---------------
mod2 <- lmer(AUC ~ (1 | Subject) + group.con + sex.con, REML = FALSE, data = df, control = lmer.opts)

##### Sex-only model ---------------
mod3 <- lmer(AUC ~ (1 | Subject) + sex.con, REML = FALSE, data = df, control = lmer.opts)

##### Group-only model ---------------
mod4 <- lmer(AUC ~ (1 | Subject) + group.con, REML = FALSE, data = df, control = lmer.opts)
```

We can get measures of effect size for both the random effects (ICC's) and fixed effects (R-squared). As-is, these effect sizes are in relation to all fixed and/or random parameters in the model. For example, the effect sizes of model 2 would be interpreted as such:


<b>ICC's</b>

* `Adjusted` - % variance in AUC explained by individual differences
* `Conditional` - % variance in AUC explained by individual differences, after taking a subject's Sex and Group into consideration
* `Adjusted - Conditional` - % variance in individual differences in AUC explained by Sex and Group


<b>R-squared</b>

* `Marginal` - % variance in AUC explained by fixed effects alone
* `Conditional` - % explained variance by both fixed and random effects

To find the effect sizes for each set of parameters, we can use the custom function `lmer_effects`.

```{r}
### Find effect sizes
mod.list <- list(mod1, mod2, mod3, mod4)

mods.effects <- lapply(mod.list, lmer_effects)
mods.effects <- do.call(rbind, mods.effects)
mods.effects <- f(mods.effects)

### For printing
rownames(mods.effects) <- c('Group + Sex + Group:Sex', 'Group + Sex',
                            'Sex only', 'Group only')

kable(mods.effects)
```

The above effect sizes tell us the variance explained for <i>all</i> fixed/random effects included in the model. To isolate the variance explained by each effect, we use a custom summary comparison function: `comp_lmer_mods`. This also provides us with a coefficient estimate, standard error, p-value, and confidence interval.

```{r}
c1 <- comp_lmer_mods(mod2, mod1) # interaction
c2 <- comp_lmer_mods(mod3, mod2) # group
c3 <- comp_lmer_mods(mod4, mod2) # sex

mod.comps <- rbind(c1, c2, c3)
mod.comps <- f(mod.comps)
rownames(mod.comps) <- c('Group x Sex', 'Group', 'Sex')

kable(mod.comps)
```

This may look like an overwhelming amount of information for each model comparison. Fortunately, it all makes sense in context.

## Interpretation

We will only work with the model comparisons. Although full model effect sizes are useful in situations where you can only add 2+ parameters, is somewhat tangential to most cases of hypothesis testing and unnecessarily complicates interpretations.

### Random effects

#### Random intercept

So how much variance is accounted for by individual differences?

```{r}
### Model
mod1 <- lmer(AUC ~ (1 | Subject) + group.con + sex.con + group.con:sex.con, data = df, REML = FALSE, control = lmer.opts)

### Random-only ICC
icc.rand <- icc(mod1)$ICC_adjusted
f(icc.rand)
```

The results show that - as the lone predictor - individual differences account for about `r icc.rand*100`% of variance in AUC.

### Fixed Effects

Now we can measure the statistical significance of mean differences between groups after accounting for between-subject variation in AUC.

#### Interaction

Our main hypothesis was that mPFC activity in female subjects that exercise would be lower than those who did not, unlike in males. So we start by comparing the most complex model with the model missing only the interaction.

```{r}
int.comp <- comp_lmer_mods(mod2, mod1)
int.comp <- fcomp(int.comp) # format

kable(int.comp)
```

* The effect of exercise on AUC does not depend on the Sex of the subject, with the coefficient estimate being `r int.comp['Est']` (p = `r int.comp['p']`)
  * Based on the data, if we were to repeat the experiment there is a 95% chance that the coefficient estimate would be between `r int.comp['CI_lower']` and `r int.comp['CI_upper']`
* The non-significance of a Group x Sex interaction is further supported by the fact that all effect sizes explain less than .001% of the variance in AUC's

#### Main Effects

We now simplify the model to test main effects by removing the interaction. The model comparison now reliably represents the effect of Group after controlling for Sex and vice versa, in addition to individual differences.

##### Group

```{r}
grp.comp <- comp_lmer_mods(mod3, mod2)
grp.comp <- fcomp(grp.comp)

kable(grp.comp)
```

*  After controlling for overall individual differences and Sex, the estimated AUC for a subject who received exercise treatment is `r grp.comp['Est']` higher than those who did not. However, this only approaches significance (p = `r grp.comp['p']`)
  * If we were to repeat the experiment, there is a 95% chance that the Group coefficient estimate would be between `r grp.comp['CI_lower']` and `r grp.comp['CI_upper']`
* The effect of exercise was technically not statistically significant, but it approaches significance; which warrants further investigation.
  * The percent of variance in AUC's explained by individual differences alone decreases by `r abs(grp.comp['ICC.adj'])*100`%
  * After controlling for Group and Sex, individual differences explain about `r abs(grp.comp['ICC.cond']*100)`% less variance than when Group is added to the model with a Sex covariate
  * Therefore, the amount of overall variance explained by individual differences accounted for by Group is `r abs(grp.comp['ICC.AminusC']*100)`%
  * The percent of variance explained by Group without considering overall individual differences is `r abs(grp.comp['R2.marg']*100)`%

These bits imply that if there is an effect of exercise, it is a very small one.

##### Sex

```{r}
sex.comp <- comp_lmer_mods(mod4, mod2)
sex.comp <- fcomp(sex.comp)

kable(sex.comp)
```

* After controlling for overall individual differences and Group, the mean AUC for male subjects is `r sex.comp['Est']` lower than female subjects (p < `r 0.001`)
  * If we were to repeat the experiment, there is a 95% chance that the mean difference between sexes would be between `r sex.comp['CI_lower']` and `r sex.comp['CI_upper']`
* The significance of exercise treatment is further supported by the interesting effect sizes:
  * The percent of variance in AUC's explained by individual differences alone decreases by `r abs(sex.comp['ICC.adj']*100)`%
  * After controlling for Group and Sex, individual differences explain about `r abs(sex.comp['ICC.cond']*100)`% less variance than when Sex is added to the model with a Group covariate
  * Therefore, the percent of variance in individual differences accounted for by Sex is `r abs(sex.comp['ICC.AminusC']*100)`%
  * Sex by itself explains `r abs(sex.comp['R2.marg']*100)`% of variance in AUC's
    * which is about `r (abs(sex.comp['R2.marg']) / abs(sex.comp['R2.cond']))*100`% of the variance explained by individual differences

A subject's sex is a significant predictor of change in mPFC activity from pre-trial baseline, as a result of uncontrollable stress.

### Tables and visuals

```{r}
mod.comps <- as.data.frame(mod.comps)
##### Create labels --------------
comp.labs <- c('Group x Sex',
               'Exercise - Sedentary',
               'Female - Male')
mod.comps$Comp.labs <- factor(comp.labs, levels = comp.labs)
```

First, we graph the difference between group means for each hypothesis we tested.

<details><summary>(details)</summary>
<p>
The horizontal orientation of the mean difference graph emphasizes that we are looking at the <i>difference</i> between contrasts. So "Female - Male" literally means subtracting the mean of males from the mean of females. The extending bars represent the 95% confidence interval of these differences. That is, if we were to successfully replicate this study under identical conditions, there is a 95% chance that the difference between those 2 group means would be within that range.
</p></details>

```{r include=FALSE}
##### Add labels and theme --------------
labs.theme <- theme(
    axis.title.x = element_blank(),
    axis.ticks = element_blank(),
    panel.grid.minor.x = element_line(color = 'gray96'),
    panel.grid.major.x = element_line(color = 'gray96'),
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    axis.text.x = element_text(face = 'bold'),
    axis.text.y = element_text(face = 'bold')
  )
```

```{r}
##### Plot --------------
results.plot <- ggplot(mod.comps) +
  geom_point(aes(x = Comp.labs, y = Est)) +
  geom_errorbar(aes(x = Comp.labs,
                    ymin = CI_lower,
                    ymax = CI_upper)) +
  scale_x_discrete(labels = mod.comps$Comp.labs,
                   breaks = mod.comps$Comp.labs,
                   limits = rev(mod.comps$Comp.labs)) +
  coord_flip() +
  labs(x = '', y = 'Mean Difference',
    title = 'Contrasts of Mean Differences',
    subtitle = 'with 95% confidence intervals') +
  labs.theme

results.plot
```

Last, we visualize how close each subject's predicted value from our model is to their actual values across trials. This can be easily done by using a similar graph across trials from earlier. Each red line represents the predicted AUC for the respective subject, which does not change across trials.

```{r}
##### New labels ---------------
df$Group2 <- paste(df$Group, df$Sex)
df.sub <- df[df$Subject.num %in% c(1:4),]

df$Group2 <- factor(df$Group2, levels=grps.ordered)
df$pred <- predict(mod2)

df.sub <- df[df$Subject.num %in% c(1:4),] ### subset

##### Graph ---------------
ggplot(df.sub) +
  geom_line(aes(x = Trial, y = pred),
            color = 'red', size = 1) +
  geom_point(aes(x = Trial, y = AUC),
             alpha = 0.3) +
  labs(x = 'Trial', y = 'AUC',
       title = 'Subject AUCs by Trial',
       subtitle = 'with predicted subject means') +
  facet_grid(Subject.num ~ Group2) +
  scale_x_continuous(breaks = c(10, 20, 30))
```

It seems that our `Sex + Group` model has captured the variance between subjects incredibly well. To make sure it isn't an over-fit, one should simulate new data and see how the model translates, or use a training set for the model and a test set for checking accuracy. But for now, our hypotheses testing is finished!