forked from MathiasHarrer/Doing-Meta-Analysis-in-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path07-metaregression.Rmd
197 lines (133 loc) · 12.9 KB
/
07-metaregression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# Meta-Regression
![](regressionbild.jpg)
Conceptually, **Meta-Regression** does not differ much from a **subgroup analysis**. In fact, subgroup analyses with more than two groups are nothing more than a meta-regression with categorial covariates. However, meta-regression does also allow us to use **continuous data** as covariates and check weather values of this variable are associated with effect size.
<<<<<<< HEAD
**The idea behind meta-regression**
You may have already performed regressions in regular data where participants or patients are the **unit of analysis**. In typical Meta-Analyses, we do not have the individual data for each participant available, but only the **aggregated effects**, which is why we have to perform meta-regressions with covariates at the **study level**. This also means that while we conduct analyses on participant samples much larger than usual in single studies, it is still very likely that **we don't have enough studies for a meta-regression to be sensible**. In [Chapter 7](#subgroup), we told you that subgroup analyses make no sense when *k*<10. For **meta-regression**, Borenstein and colleages [@borenstein2011] recommend that **each covariate should at least contain ten studies**, although this should not be seen as clear rule. In a conventional regression, we want to estimate a parameter $y$ using a covariate $x_i$ with $n$ regression coefficients $\beta$. A standard regression equation therefore looks like this:
=======
```{block,type='rmdinfo'}
**The idea behind meta-regression**
You may have already performed regressions in regular data where participants or patients are the **unit of analysis**. In typical Meta-Analyses, we do not have the individual data for each participant available, but only the **aggregated effects**, which is why we have to perform meta-regressions with covariates at the **study level**. This also means that while we conduct analyses on participant samples much larger than usual in single studies, it is still very likely that **we don't have enough studies for a meta-regression to be sensible**. In [Chapter 7](#subgroup), we told you that subgroup analyses make no sense when *k*<10. For **meta-regression**, Borenstein and colleages [@borenstein2011] recommend that **each covariate should at least contain ten studies**, although this should not be seen as clear rule.
In a conventional regression, we want to estimate a parameter $y$ using a covariate $x_i$ with $n$ regression coefficients $\beta$. A standard regression equation therefore looks like this:
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
$$y=\beta_0 + \beta_1x_1 + ...+\beta_nx_n$$
In a meta-regression, we want to estimate the **effect size** $\theta$ for different values of the covariate(s), so our regression looks like this:
$$\hat \theta_k = \theta + \beta_1x_{1k} + ... + \beta_nx_{nk} + \epsilon_k + \zeta_k$$
<<<<<<< HEAD
You might have seen that when estimating the effect size $\theta_k$ of a study $k$ in our regression model, there are two **extra terms in the equation**, $\epsilon_k$ and $\zeta_k$. The same terms can also be found in the equation for the random-effects-model in [Chapter 4.2](#random). The two terms signify two types of **independent errors** which cause our regression prediction to be **imperfect**. The first one, $\epsilon_k$, is the sampling error through which the effect size of the study deviates from its "true" effect. The second one, $\zeta_k$, denotes that even the true effect size of the study is only sampled from **an overarching distribution of effect sizes** (see the Chapter on the [Random-Effects-Model](#random) for more details). In a **fixed-effect-model**, we assume that all studies actually share the **same true effect size** and that the **between-study heterogeneity** $\tau^2 = 0$. In this case, we do not consider $\zeta_k$ in our equation, but only $\epsilon_k$. As the equation above has includes **fixed effects** (the $\beta$ coefficients) as well as **random effects** ($\zeta_k$), the model used in meta-regression is often called **a mixed-effects-model**. Mathematically, this model is identical to the **mixed-effects-model** we described in [Chapter 7](#subgroup) where we explained how **subgroup analyses** work.Indeed **subgroup analyses with more than two subgroups** are nothing else than a **meta-regression** with a **categorical predictor**. For meta-regression, these subgroups are then **dummy-coded**, e.g.
=======
You might have seen that when estimating the effect size $\theta_k$ of a study $k$ in our regression model, there are two **extra terms in the equation**, $\epsilon_k$ and $\zeta_k$. The same terms can also be found in the equation for the random-effects-model in [Chapter 4.2](#random). The two terms signify two types of **independent errors** which cause our regression prediction to be **imperfect**. The first one, $\epsilon_k$, is the sampling error through which the effect size of the study deviates from its "true" effect. The second one, $\zeta_k$, denotes that even the true effect size of the study is only sampled from **an overarching distribution of effect sizes** (see the Chapter on the [Random-Effects-Model](#random) for more details). In a **fixed-effect-model**, we assume that all studies actually share the **same true effect size** and that the **between-study heterogeneity** $\tau^2 = 0$. In this case, we do not consider $\zeta_k$ in our equation, but only $\epsilon_k$.
As the equation above has includes **fixed effects** (the $\beta$ coefficients) as well as **random effects** ($\zeta_k$), the model used in meta-regression is often called **a mixed-effects-model**. Mathematically, this model is identical to the **mixed-effects-model** we described in [Chapter 7](#subgroup) where we explained how **subgroup analyses** work.
Indeed **subgroup analyses with more than two subgroups** are nothing else than a **meta-regression** with a **categorical predictor**. For meta-regression, these subgroups are then **dummy-coded**, e.g.
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
$$ D_k = \{\begin{array}{c}0:ACT \\1:CBT \end{array}$$
$$\hat \theta_k = \theta + \beta x_{k} + D_k \gamma + \epsilon_k + \zeta_k$$
In this case, we assume the same **regression line**, which is simply "shifted" **up or down for the different subgroups** $D_k$.
<<<<<<< HEAD
=======
```
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
```{r, echo=FALSE, fig.width=6,fig.cap="Visualisation of a Meta-Regression with dummy-coded categorial predictors",fig.align='center'}
library(png)
library(grid)
img <- readPNG("dummy.PNG")
grid.raster(img)
```
<<<<<<< HEAD
$~$
=======
<br><br>
```{block,type='rmdinfo'}
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
**Assessing the fit of a regression model**
To evaluate the **statistical significance of a predictor**, we a **t-test** of its $\beta$-weight is performed.
$$ t=\frac{\beta}{SE_{\beta}}$$
<<<<<<< HEAD
Which provides a $p$-value telling us if a variable significantly predicts effect size differences in our regression model. If we fit a regression model, our aim is to find a model **which explains as much as possible of the current variability in effect sizes** we find in our data. In conventional regression, $R^2$ is commonly used to quantify the **goodness of fit** of our model in percent (0-100%). As this measure is commonly used, and many researchers know how to to interpret it, we can also calculate a $R^2$ analog for meta-regression using this formula:
=======
Which provides a $p$-value telling us if a variable significantly predicts effect size differences in our regression model.
If we fit a regression model, our aim is to find a model **which explains as much as possible of the current variability in effect sizes** we find in our data.
In conventional regression, $R^2$ is commonly used to quantify the **goodness of fit** of our model in percent (0-100%). As this measure is commonly used, and many researchers know how to to interpret it, we can also calculate a $R^2$ analog for meta-regression using this formula:
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
$$R_2=\frac{\hat\tau^2_{REM}-\hat\tau^2_{MEM}}{\hat\tau^2_{REM}}$$
Where $\hat\tau^2_{REM}$ is the estimated total heterogenetiy based on the random-effects-model and $\hat\tau^2_{REM}$ the total heterogeneity of our mixed-effects regression model.
<<<<<<< HEAD
=======
```
<br><br>
---
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
## Calculating meta-regressions in R
Meta-regressions can be conducted in R using the `metareg` function in `meta`. To show the similarity between `subgroup` analysis and `meta-regression` with categorical predictors, i'll first conduct a meta-regression with my variable "Control" as predictor again.
<<<<<<< HEAD
$~$
=======
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
```{r}
metareg(m.hksj,Control)
```
<<<<<<< HEAD
$~$
We see in the output that the `metareg` function uses the values of "Control" (i.e, the three different types of control groups) as a **moderator**. It takes **"information only"** as a dummy-coded *reference group*, and **"no intervention"** and **"WLC"** as dummy-coded **predictors**. Under `Test of Moderators`, we can see that control groups are not significantly associated with effect size differences $F_{2,15}=0.947$, $p=0.41$. Our regression model does not explain any of the variability in our effect size data ($R^2=0\%$). Below `Model Results`, we can also see the $\beta$-values (`estimate`) of both predictors, and their significance level `pval`. As we can see, both predictors were not significant.
$~$
**Continuous variables**
Let's assume i want to check if the **publication year** is associated with effect size. I have stored the variable `pub_year`, containing the publication year of every study in my dataset, and conducted the meta-analysis with it. I stored my meta-analysis output in the `m.pubyear` output. Now, i can use this predictor in a meta-regression.
$~$
=======
We see in the output that the `metareg` function uses the values of "Control" (i.e, the three different types of control groups) as a **moderator**. It takes **"information only"** as a dummy-coded *reference group*, and **"no intervention"** and **"WLC"** as dummy-coded **predictors**. Under `Test of Moderators`, we can see that control groups are not significantly associated with effect size differences $F_{2,15}=0.947$, $p=0.41$. Our regression model does not explain any of the variability in our effect size data ($R^2=0\%$).
Below `Model Results`, we can also see the $\beta$-values (`estimate`) of both predictors, and their significance level `pval`. As we can see, both predictors were not significant.
<br><br>
**Continuous variables**
Let's assume i want to check if the **publication year** is associated with effect size. I have stored the variable `pub_year`, containing the publication year of every study in my dataset, and conducted the meta-analysis with it. I stored my meta-analysis output in the `m.pubyear` output.
**Now, i can use this predictor in a meta-regression.**
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
```{r,echo=FALSE}
load("Meta_Analysis_Data.RData")
madata<-Meta_Analysis_Data
madata$pub_year<-c(2001,2002,2011,2013,2013,2014,1999,2018,2001,2002,2011,2013,2013,2014,1999,2018,2003,2005)
madata$pub_year<-as.numeric(madata$pub_year)
m.pubyear<-metagen(TE,seTE,studlab = paste(Author),comb.fixed = FALSE,data=madata)
```
```{r}
output.metareg<-metareg(m.pubyear,pub_year)
output.metareg
```
<<<<<<< HEAD
$~$
As you can see from the output, `pub_year` was now included as a **predictor**, but it is not significantly associated with the effect size ($p=0.9412$).
=======
As you can see from the output, `pub_year` was now included as a **predictor**, but it is not significantly associated with the effect size ($p=0.9412$).
<br><br>
---
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
## Plotting regressions
```{r, echo=FALSE, fig.width=4,fig.cap="A finished bubble plot",fig.align='center'}
library(png)
library(grid)
img <- readPNG("metareg.PNG")
grid.raster(img)
```
To plot our meta-regression output, we can use the `bubble` function in `meta`. Here a few parameters we can specify for this function.
```{r,echo=FALSE}
i<-c("xlim","ylim","xlab","ylab","col","lwd","col.line","studlab")
ii<-c("The x-axis limit of the plot. Must be specified as, e.g., 'xlim=c(0,1)'","The y-axis limit of the plot. Must be specified as, e.g., 'ylim=c(0,1)'","The label for the x axis","The label for the y axis","The color of the individual studies","The line width of the regression line","The color of the regression line","If the labels for each study should be printed within the plot (TRUE/FALSE)")
ms<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms)<-names
kable(ms)
```
<<<<<<< HEAD
$~$
=======
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876
```{r,eval=FALSE}
bubble(output.metareg,
xlab = "Publication Year",
col.line = "blue",
studlab = TRUE)
```
<<<<<<< HEAD
=======
<br><br>
---
>>>>>>> f3259eafbebf95ffc6044d4af3f61e06d59c7876