sme/lme.Rmd at main · uvastatlab/sme · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
---
title: "Mixed-Effect Models"
output:
  html_document:
    toc: TRUE
    toc_float: TRUE
bibliography: refs.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, comment = '')
```

### Multilevel model: two levels

The `eggprod` data in the faraway package contains data on egg production. [@faraway] Six pullets (young hens) were placed into each of 12 pens. Four blocks were formed from groups of 3 pens based on location. Three treatments were applied. The number of eggs produced was recorded. Start by visualizing the distribution of eggs produced split by blocks and treatment.

```{r}
library(faraway)
data("eggprod")

library(ggplot2)
# Plotting eggs by treat
ggplot(eggprod, aes(x=treat, y=eggs, color = treat)) +
  geom_point() +
  coord_flip() + # Flipping x and y coordinates to mimic Base R plotting
  theme_classic() # Using classic theme

# Plotting eggs by block
ggplot(eggprod, aes(x=block, y=eggs, color = block)) +
  geom_point() +
  coord_flip() +
  theme_classic()
```


This shows visually that treatment O may have produced fewer eggs than treatment F or E whose distributions are more similar and closer to one another. The distributions of eggs produced by each block seem to overlap and are very similar to one another.

Now, we  model the number of eggs produced with `treat` as a fixed effect and `block` as a random effect using the `lmer()` function.

```{r}
library(lme4)
m <- lmer(eggs ~ treat + (1|block), data = eggprod)
```

We specified the formula within the `lmer()` function following the format `outcome ~ fixed effect + (random slope | random intercept)`. The random slope is set to 1, therefore it is a fixed slope. In other words, all cases will have the same slope estimated. The random intercept is set as `block` which means that the intercept will be allowed to vary by `block`. Before analyzing any statistics on this model, we first need to confirm we have met all the necessary assumptions. The mlmtools package [@mlmtools] offers a function called `mlm_assumptions()` which will test all the appropriate assumptions for a multilevel model.

```{r}
library(mlmtools)
m_assum <- mlm_assumptions(m)
# Homogeneity of variance
m_assum$homo.test
```

The output from the test of homogeneity of variance indicates that this assumption has been met. $p = 0.56$ which is greater than .05, indicating that the residual variation across clusters is not significantly different.

```{r}
# Outliers
m_assum$outliers

# Multicollinearity
m_assum$multicollinearity
```

No outliers were detected in the model and since we only have one predictor multicollinearity is not necessary to test. The remaining tests included in `m_assum` are plots we need to visually inspect.

```{r}
m_assum$fitted.residual.plot
```

The first plot to investigate is the above Fitted vs. Residuals plot. On the x-axis are the fitted (predicted) values and on the y-axis are the residuals (error). A dashed line appears at Residuals = 0 to represent where the model is perfectly predicting a value. What we look for here is whether the residuals lie randomly around the 0 line or if a pattern appears in relation to the predicted values. We would want this distribution to appear rather parallel to the 0 line, indicating that residuals are fairly comparable to one another. Lastly, no one point should stand out from the rest. If one does, it means there is an outlier and that one point is be drastically misestimated. All 3 of these criteria are met in the above plot.

Lastly, we can investigate the assumption of residual normality.

```{r}
m_assum$resid.normality.plot
```

The above QQ-plot compares the model residuals to a normal distribution. Ideally, all points should be as close as possible to the line and be evenly distributed around it (e.g., not all points are above the line; points don't form a curve below the line). We can see that the points are randomly and evenly distributed around the line and can assume we meet this assumption.

Now, we can inspect the model estimates and its overall fit. [@faraway_book] Does `treat` appear to affect the number of eggs, and if so, which treatment seems superior? We can look into this assessing both model fit and significance of predictors. First, a summary of the model is useful to look at.

```{r}
summary(m)
```

Looking at the predictor, the fixed intercept, 349.00, is the mean eggs when the treat factor is at its lowest level (treatE). The intercept fixed effect is the grand mean of eggs across the entire sample when treat = "E" (349.00). The treatF and treatO fixed effects can be interpreted as the difference in means (e.g., treatE - treatF). Therefore, treatF produces 6.25 less mean egg production and treatO produces 42.50 less mean egg production.

This is further substantiated by looking at the means by treatment. Observed means of eggs by treat:

```{r}
tapply(eggprod$eggs, eggprod$treat, mean)
```

We see that the means of treatment `E` and `F` are closer to each other than either of them are to treatment `O`. To test whether this effect is significant, there are two options. We can use the `car::Anova` function, or load the `lmerTest` package [@lmerTest].

```{r}
car::Anova(m)
```

The `Anova()` function shows that the model including treat is a better fit ($\chi^2(2,N = 12) = 10.887, p < .01$) than the null model (intercept only). This tells us that this is a better fitting model, but does not provide deeper information as to potential significant differences within levels of our predictor. To investigate this, we can load `lmerTest` package and rerun the model. This will add a significance marker to the independent variables if they are significant.

```{r}
library(lmerTest)
m <- lmer(eggs ~ treat + (1|block), data = eggprod)
summary(m)
```

This shows that treatE and treatO are significantly different from one another, but neither is significantly different from treatF. We can further confirm this using pairwise comparisons, adjusted p-values and confidence intervals:

```{r}
library(emmeans)
emmeans(m, pairwise ~ treat)
emmeans(m, pairwise ~ treat) |> confint()
```

We see that none of the mean differences are significant by treatment group. However, the `t.ratio` is higher for `E-O` and `F-O` than `E-F`. Visually we can also see this using an effect plot:

```{r}
library(ggeffects)
ggpredict(m, terms = ~ treat) |> plot(show_data = TRUE)
```

This plot shows us the spread of predicted egg values based on their treatment group. The black dot in the middle of each line represents the prediction and the length of the line represents its 95% confidence interval. The dots around each line represent the true data points. We can see the same trend as indicated by their predicted value, `E` and `F` are closer to one another than either is to `O`, and we can also see that the spread of predicted values for `O` overlaps with `E` and `F`'s spreads.

Another statistic of interest for multilevel models is the Intraclass Correlation Coefficient (ICC). The ICC tells us how strongly observations within clusters are associated with one another. In other words, it represents the correlation of any two units within a cluster. To assess this, we can use the `ICCm` function from the `mlmtools` package.

```{r}
ICCm(m)
```

The likeness of egg production in the same block is .25, in other words, the correlation between any two egg production observations within the same block is .25. We can also look at what the variation is between clusters using the `VarCorr()` function.

```{r}
VarCorr(m)
```

The variation around the random intercept for block is 11.40. We can also look at the $R^2$ values for the model to get an idea of how much variation in egg production is explained. We can use the `rsqmlm()` function from the `mlmtools` package to look at this.

```{r}
rsqmlm(m)
```

We see that by including just the fixed effects, 42.56% of the variance in egg production is explained. By including both the fixed and random effects, the variance explained increases to 57%.

### Repeated-Measures model

The book _Linear Mixed Models_ presents a study on rat brains [@west2015]. In this study, five rats had three regions of their brains measured for "activation" after two different treatments. Since all rats received both treatments and had the same three regions measured both times, this is a _repeated-measures_ analysis. Of interest is how the numeric dependent variable, activate, changes based on treatment and brain region.

The data is available in the file "rat_brain.dat". We can import this file using the `read.table()` function. We set `header = TRUE` because the first row of the data contains column headers.

```{r}
URL <- "https://github.com/uvastatlab/sme/raw/main/data/rat_brain.dat"
rats <- read.table(URL, header = TRUE)
```

The animal column contains the id for each rat. Below we use the `head()` function to view the first six rows of data. Notice the first rat, R111097, has one measure for each combination of treatment and region. This is why we refer to this as repeated-measures data. Looking at the structure of the data, treatment and region are being read in as integers. This means that if we were to use the data as is, the model will interpret 0 as a meaningful value for each of these variables. To avoid this, we will convert them both to factors.

```{r}
head(rats)
str(rats)
rats$treatment <- as.factor(rats$treatment)
rats$region <- as.factor(rats$region)
```

We can start off plotting the data. The `stripplot()` from the `lattice` package [@lattice] produces a one-dimensional scatterplot of the outcome variable (activate) by rat.

```{r}
library(lattice)
stripplot(activate ~ animal, data=rats, scales=list(tck=0.5))
```

We can consider and fit several models with these data. The first model will be the null model containing no predictors only a random slope by animal. Then treatment will be introduced, treatment and region, and then the interaction between treatment and region.

```{r echo=FALSE}
detach("package:lmerTest", unload = TRUE)
```

```{r}
m <- lmer(activate ~ 1 + (1 | animal), data = rats)
m2 <- lmer(activate ~ treatment + (1 | animal), data = rats)
m3 <- lmer(activate ~ treatment + region + (1 | animal), data = rats)
m4 <- lmer(activate ~ treatment*region + (1 | animal), data = rats)
```

Now we can compare model fit. We'll start by looking at the AIC values for each model. Lower AIC value suggests a model will perform better on out-of-sample data. These AIC values indicate that the model containing the interaction between treatment and region is the best fit for these data.

```{r}
AIC(m, m2, m3, m4)
```

We can also look to see how much more variance in activation is explained by the model including the interaction than the null model.

```{r}
varCompare(m, m4)
```

We see that the model including the interaction explains 70.93% more variance than the null model. This is quite a large increase in variance explained and supports this model being a good model for these data. We can also look at the $R^2$ values for the model containing the interaction. We see that 72.09% of the variance is explained by the fixed effects (the interaction) and 90.63% of the variance is explained by both the fixed and random effects.

```{r}
rsqmlm(m4)
```

Now let's interpret the model parameter estimates.

```{r}
summary(m4)
```

Looking first at the fixed effects, the intercept (428.51) can be interpreted as the grand mean of activate when treatment and region are both 1 (the lowest level of each factor). The treatment2 estimate (98.20) is the mean difference between treatment 2 and treatment 1 when region is 1. Similarly, the region2 estimate (-190.76) is the mean difference between region 2 and region 1 when treatment is 1, and region3 (-216.21) is the mean difference between region 3 and region 2 when treatment is 1. The interaction terms require some calculation to full interpret. Since we have two categorical predictors in this interaction, we can interpret their coefficients in this way:

$\text{intercept} + \text{factor level main effect 1} + \text{factor level main effect 2} + \text{interaction effect}$

For example, when looking at treatment2:region2 we will take the values from the Estimates column that align with the following fixed effects:

$\text{(Intercept)} + \text{treatment2} + \text{region2} + \text{treatment2:region2}$

Which gives us:

$428.51 + 98.20 - 190.76 + 99.32 = 435.27$

This is also the mean of activate when treatment is 2 and region is 2:

```{r}
mean(rats[rats$treatment=="2" & rats$region=="2","activate"])
```

We can do the same thing for treatment2:region3:

$428.51 + 98.20 - 216.21 + 261.82 = 572.32$

Which is the mean of activate when treatment is 2 and region is 3:

```{r}
mean(rats[rats$treatment=="2" & rats$region=="3","activate"])
```

To get a better understanding how this interaction is functioning, let's plot this model using the `ggeffects` package [@ggeffects].

```{r}
library(ggeffects)
ggeffect(m4, terms = c("region", "treatment")) |>
         plot(show_data = TRUE, jitter = 0)
```

This shows the relationship between treatment groups by region. Within region one, the estimates for activate by treatment are closer together, then in region 2 the estimate for treatment 1 and treatment 2 decrease while their magnitude of difference also increases. Then in region 3 their magnitude of differences increases further as treatment 2 increases and treatment 1 decreases. This provides visual evidence that the relationship between treatment groups changes based on the region. We can use the `Anova()` function from the `car` package [@car] to assess the statistical significance of the visual evidence.

```{r}
car::Anova(m4)
```

This indicates that the overall interaction is indeed significant. However, we only know that an interaction exists, we must conduct further comparison tests. We want to test the difference between treatment levels within each region. The `contrast()` function from the `emmeans` package [@emmeans] allows us to do this. First, we must create an emmeans object to pass to the `contrast()` function. We will tell the `emmeans()` function which model to use (m4) and which comparisons we want to test. To do this, we use the `|` symbol to say test each treatment comparison by region. Finally, since we are testing all pairwise comparisons we set the adjust argument to "tukey" to use the Tukey post-hoc multiple comparison adjustment.

```{r}
library(emmeans)
m4.emm <- emmeans(m4, ~ treatment | region, adjust = "tukey")
contrast(m4.emm)
```

The output is split by each region. It tests each comparison and within each region, therefore each region has two lines comparing treatment 1 to treatment 2 and vice versa. Consequently, these lines will conduct equivalent tests just with opposite signs in mean differences. Looking at the "p.value" column, all pairwise comparisons were significant. As we saw in the plot, the magnitude of difference between treatment groups increases from region 1 (49.1) to region 2 (98.8) and then region 3 (180.0).


### Longitudinal model

The text _Statistical Methods for the Analysis of Repeated Measurements_ [@davis] presents data on plasma inorganic phosphate measurements for 33 subjects (13 controls, 20 obese) after an [oral glucose challenge](https://www.mayoclinic.org/tests-procedures/glucose-challenge-test/about/pac-20394277). Measurements were taken at baseline, 0.5, 1, 1.5, 2, and 3 hours. Of interest is how the plasma inorganic phosphate levels differ over time and between the two groups. We present two models to investigate these questions:

- a linear mixed-effect model
- a generalized least squares model

Below we read in the data, provide columns names, and format the group column as a factor. Notice the data is in "wide" format, with one record per subject.

```{r}
URL <- "https://raw.githubusercontent.com/uvastatlab/sme/main/data/phosphate.dat"
phosphate <- read.table(URL, header = FALSE)
names(phosphate) <- c("group", "id", "t0", "t0.5", "t1", "t1.5", "t2", "t3")
phosphate$group <- factor(phosphate$group, labels = c("control", "obese"))
head(phosphate)
```

Reshaping the data facilitates data visualization. The `pivot_longer()` function from the tidyr package makes quick work of this. [@tidyr]

- The `names_to = "time"` argument moves the column names, t0 - t3, into a single column called "time"
- The `names_prefix = "t"` argument strips "t" from the column names that were placed in the "time" column.
- The `names_transform = list(time = as.numeric)` converts values in the new "time" column to numeric.
- The `values_to = "level"` moves the values under the t0 - t3 columns into a single column called "level"


```{r}
library(tidyr)
phosphateL <- pivot_longer(phosphate, cols = t0:t3,
                           names_to = "time",
                           names_prefix = "t",
                           names_transform = list(time = as.numeric),
                           values_to = "level")
head(phosphateL)
```

Now we can use the ggplot2 package [@ggplot2] to visualize the data. Below we plot the phosphate trajectories over time for each subject, with the plots broken out by group. In addition we add a smooth trend line to each plot to visualize the "average" trend for each group. There appears to be a great deal of variability between the subjects within each group. It also looks like the phosphate levels drop quickly within the first hour for the control group compared to the obese group.

```{r}
library(ggplot2)
ggplot(phosphateL) +
  aes(x = time, y = level, group = id) +
  geom_line() +
  geom_smooth(aes(group = group), se = FALSE, linewidth = 2) +
  facet_wrap(~group)
```

We can also compute simple means by time and group and compare visually. Below we use `aggregate()` to compute the means and then pipe into the ggplot code. Again it appears the phosphate levels for the control group drop faster and lower than the obese group, though by hour 3 they seem about the same.

```{r}
aggregate(level ~ time + group, data = phosphateL, mean) |>
  ggplot() +
  aes(x = time, y = level, linetype = group) +
  geom_point() +
  geom_line()
```

In both plots it appears the effect of time differs between groups. Therefore it seems reasonable to allow these terms to interact in our mixed-effect model.

Before we begin modeling, we make the baseline phosphate measure a predictor since a baseline measure is technically not a response to any treatment or condition. [@senn] To do this we use `pivot_longer()` again but this time without selecting column t0. When finished we rename the column as "baseline". We now have a column called "baseline" in our long data frame that we will use as a predictor in our models.

```{r}
phosphateL <- pivot_longer(phosphate, cols = t0.5:t3,
                           names_to = "time",
                           names_prefix = "t",
                           names_transform = list(time = as.numeric),
                           values_to = "level")
names(phosphateL)[3] <- "baseline"
head(phosphateL)
```

Finally we renumber group ids in the obese group to pick up where the control group id numbering leaves off at 13.

```{r}
phosphateL$id <- ifelse(phosphateL$group == "obese",
                        phosphateL$id + 13,
                        phosphateL$id)
```


#### Linear Mixed-Effect Model

To fit our mixed-effect models we will use the `lmer()` function in the lme4 package [@lme4]. A mixed-effect model essentially allows us to fit separate models to each subject. We assume each subject exerts some _random effect_ on certain model parameters. The most basic mixed-effect model is the random intercept model, where each subject receives their own intercept. This is the model we will fit.

Below we model level as a function of baseline (t0), time, group, and the time:group interaction. We also fit a random intercept for each subject using the syntax `(1|id)`. The summary output shows a fairly large t value for the interaction, which provides good evidence that the interaction is reliably negative. (The summary output for lmer models does not display p values since the null distributions for the t values are unknown for unbalanced data in the mixed-effect model framework.)

```{r}
library(lme4)
m <- lmer(level ~ baseline + time * group + (1|id), data = phosphateL)
summary(m, corr = FALSE)
```

If we view the coefficients we see each subject has their own model, but only the intercept varies between the subjects. Hence the reason we call this a random-intercept model.

```{r}
head(coef(m)$id)
```

Before we use or interpret this model, we should assess the residuals vs fitted value plot. We can create this plot by simply calling `plot()` on the model object. We would like to see a uniform distribution of residuals hovering closely around 0. This plot looks OK for the most part.

```{r}
plot(m)
```

The model we fit assumes the effect of time is _linear_, and that the linear effect of time changes for each group. We can visualize the model using the ggeffects package [@ggeffects]. With the data added it seems like the linear effects may be too simplistic.

```{r}
library(ggeffects)
ggeffect(m, terms = c("time", "group")) |>
  plot(show_data = TRUE, jitter = 0)
```

Given our exploratory plots we may want to entertain a _non-linear_ effect of time. One way to do this is with natural cubic splines. Using the splines package [@R], we can specify that we want to allow the trajectory of time to change directions up to 3 times using the syntax `ns(time, df = 3)`. We skip summarizing the model and look at the residuals vs fitted value plot. This looks about the same as the previous plot but a careful look at the y axis reveals we have slightly smaller residuals.

```{r}
library(splines)
m2 <- lmer(level ~ baseline + ns(time, df = 3) * group + (1|id),
           data = phosphateL)
plot(m2)
```

We can also examine the distribution of residuals by time within each group using the syntax below. The plot shows residuals evenly distributed and not too far from 0, which is good.

```{r}
plot(m2, resid(.) ~ time | group)
```

Just as with linear models, we assume errors for a mixed-effect model are drawn from a Normal distribution. A QQ plot helps us assess this assumption. For models fit with lme4 we need to use the `qqmath()` function from the lattice package [@lattice] to create a QQ plot. The plot below provides no reason for us to question this assumption.

```{r}
lattice::qqmath(m2)
```

We can compare the models' AIC values to assess whether the more complicated non-linear model is better. Recall that lower AIC values suggest a model will perform better on out-of-sample data. It appears the model with non-linear effects is a good deal better than the model with linear effects. The df column refers to the number of parameters in the model.

```{r}
AIC(m, m2)
```

The non-linear model seems better than the original linear model. But is the non-linear model a _good_ model? One way to assess this is to simulate data using the model and see how it compares to the original data. Below we use the `simulate()` function to simulate 50 sets of phosphate values from our non-linear model. Next we plot a smooth histogram of the observed data. Then we loop through the simulated data and plot a smooth histogram for each simulation. What we see is that our model is generating data that looks pretty similar to our original data, though it seems to be a little off in the 3 - 4 range. It's not perfect, and we wouldn't want it to be, but it looks good enough.

```{r}
sim1 <- simulate(m2, nsim = 50)
plot(density(phosphateL$level), ylim = c(0, 0.6))
for(i in 1:50)lines(density(sim1[[i]]), col = "grey90")
```

As before, let's use ggeffects to visualize our model. We see the bend in our fitted lines due to the non-linear effects.

```{r}
ggeffect(m2, terms = c("time", "group")) |>
  plot(show_data = TRUE, jitter = 0)
```

Looking at the summary for model m2 reveals coefficients that are impossible to interpret. This is one of the drawbacks of models with non-linear effects. However we see under the Random effects section that there is more variation within subjects (0.4246) than between subjects (0.2615).

```{r}
summary(m2, corr = FALSE)
```

Judging from the effect display the interaction in the model is warranted, but we can formally test and verify this using the `Anova()` function in the car package [@car]. The large chi-square statistic (37.6901) and small p-value confirm this.

```{r}
library(car)
Anova(m2)
```

Since we have both non-linear effects and interactions, there is no convenient interpretation of our model parameters. However we can use our model to make predictions at levels of interest and then compare the expected values. For example, what is the expected difference in phosphate values between control and obese subjects at 1 hour post glucose challenge assuming a baseline value of 4? We can answer this using the emmeans package [@emmeans]. Below we specify we want to use model m2 and compare groups at time = 1 and baseline = 4, and then pipe into the `confint()` function for a 95% confidence interval on the difference. (We can disregard the note about interactions since we specified the time.) The output indicates an estimated difference of about -0.825 with a 95% CI of [-1.15, -0.502].

```{r}
library(emmeans)
emmeans(m2, specs = pairwise ~ group,
        at = list(time = 1, baseline = 4)) |>
  confint()
```


#### Generalized Least Squares Model

Another approach to analyzing longitudinal data is via Generalized Least Squares (GLS) [@nlmebook]. In this framework we do away with random effects and model _correlated_ errors. The nlme package that comes with R [@nlme] provides the `gls()` function for this task along with a family of Correlation Structure Classes that allow us to model various correlation structures.

To motivate the Generalized Least Squares approach, let's ignore the subject-level grouping structure and fit a linear model using the `gls()` function.

```{r}
library(nlme)
m0 <- gls(level ~ baseline + ns(time, df = 3) * group,
          data = phosphateL)
```

Next we'll investigate correlation of errors using a _semivariogram_. Below we use the `Variogram()` function from the nlme package and pipe the result into the `plot()` function. The arguments `form = ~ time | id` and `resType = "n"` says to look at the "normalized" residuals over time by id. In addition we specify `robust = TRUE` to reduce the influence of outliers. The positive trend in the semivariogram values over "distance" indicate the errors exhibit some correlation.

```{r}
Variogram(m0, form = ~ time | id, resType = "n", robust = TRUE) |>
  plot(span = 0.8)
```

To help understand this, let's extract the residuals from our linear model, and then apply the `acf()` function to the residuals by subject id. The `acf()` calculates autocorrelation at all possible lags. We then extract the autocorrelation at lag 1 and sort by absolute value. Notice that more than half the subjects have an autocorrelation at lag 1 greater than 0.2 in absolute value.

```{r}
r <- residuals(m0)
acf_out <- tapply(r, phosphateL$id, acf, plot = FALSE)
lag1 <- sapply(acf_out, function(x)x[1,1][["acf"]])
mean(abs(lag1) > 0.2)
```

Now we model this correlation using the `corCAR1()` correlation structure. This will estimate a single correlation parameter for the errors that will decrease exponentially with each lag. Notice the model specification itself has not changed. We've simply added the argument `correlation = corCAR1(form = ~ time | id)`.

```{r}
m3 <- gls(level ~ baseline + ns(time, df = 3) * group,
          data = phosphateL,
          correlation = corCAR1(form = ~ time | id))
```

Now the semivariogram shows no major patterns, suggesting the model with the correlation structure is adequate.

```{r}
Variogram(m3, form = ~ time | id, resType = "n", robust = TRUE) |>
  plot(span = 0.8)
```

In addition we can formally compare the models using the `anova()` function. The large Likelihood Ratio test statistic and difference in AIC/BIC values provides strong evidence against the assumption of independent errors.

```{r}
anova(m0, m3)
```

Does our GLS model simulate data that is similar to our observed data? To assess this we use the `simulate_lme()` function from the nlraa package [@nlraa]. (There is no simulate method included with the nlme package.) The argument `psim = 2` specifies that the simulation use the residual standard error and correlation pattern to generate uncertainty. The result is encouraging.

```{r}
library(nlraa)
sim2 <- simulate_lme(m3, psim = 2, nsim = 50)
plot(density(phosphateL$level), ylim = c(0, 0.6))
for(i in 1:50)lines(density(sim2[,i]), col = "grey90")
```

Let's look at a summary of the model. We save the summary so we can easily extract the estimated correlation parameter, called "Phi". The summary object is a list. The correlation parameter is contained within the modelStruct element.

```{r}
m3_summary <- summary(m3)
m3_summary$modelStruct$corStruct
```

The estimate of the correlation between two phosphate measurements taken one hour apart on the same subject is about 0.302. The estimated correlation for measurements two hours apart is $0.30265^2 = 0.0915$.

Calling `coef()` on the summary object returns the coefficient table. Unlike the `lmer()` model, the `gls()` model returns p-values for the marginal hypothesis tests of the coefficients.

```{r}
coef(m3_summary)
```

Notice the GLS coefficients are not terribly different from the mixed-effect model coefficients.

```{r}
# mixed-effect model coefficients
coef(summary(m2))
```

Major differences between our mixed-effect model and GLS model are as follows:

- The GLS model does not allow for subjects to have distinct trajectories while the mixed-effect model does.
- The GLS model does not estimate between-subject variability while the mixed-effect model does.
- The mixed-effect model assumes within-subject errors are independent while the GLS model assumes they're correlated.

The within-subject errors for the mixed-effect model are assumed to be drawn from a Normal distribution with mean 0 and standard deviation of 0.42. The subject-specific random effects for the intercept are assumed to be drawn from a Normal distribution with mean 0 and a standard deviation of 0.26. We can extract these values using the `VarCorr()` function.

```{r}
VarCorr(m2)
```

The within-subject errors for the GLS model are assumed to be drawn from a multivariate Normal distribution with mean 0 and a variance-covariance matrix with the specified correlation pattern. We can extract this matrix using the `getVarCov()` function. The standard deviations listed is the estimated residual standard error.

```{r}
getVarCov(m3)
```

Using the `cov2cor()` function we can see the estimated correlation pattern for the errors. Notice our Phi estimate, 0.30265, appears two spots over from the diagonal. That's because some of our time measures are in half-hour increments.

```{r}
cov2cor(getVarCov(m3))
```

As with the mixed-effect model we should assess model fit, the constant variance assumption, and normality of residuals. No pattern in the residuals is evident and just about all the standardized residuals are within 2 units of 0.

```{r}
plot(m3)
```

The constant variance appears to hold at the 5 time points as well.

```{r}
plot(m3, resid(.) ~ time | group)
```

And the QQ plot of residuals gives no reason to doubt the assumption of normality.

```{r}
qqnorm(m3)
```

Once again we can use the ggeffects package to visualize our model.

```{r}
ggeffect(m3, terms = c("time", "group")) |>
  plot(show_data = TRUE, jitter = 0)
```

And using the emmeans package we can make comparisons at levels of interest. For example, what is the expected difference in phosphate values between control and obese subjects at 1 hour post glucose challenge assuming a baseline value of 4? (We can disregard the note about interactions since we specified the time.) The output indicates an estimated difference of about -0.814 with a 95% CI of [-1.17, -0.457]. This is similar to what we obtained with the mixed-effect model.

```{r}
emmeans(m3, specs = pairwise ~ group,
        at = list(time = 1, baseline = 4)) |>
  confint()
```


### References