bugs-examples-in-stan/resistant.Rmd at master · clarewtz/bugs-examples-in-stan · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# Resistant: Outlier-resistant regression via the Student's $t$ distribution {#resistant}

```{r resistant_setup,message=FALSE,cache=FALSE}
library("tidyverse")
library("rstan")
```

Outlying data points can distort estimates of location, such as means or regression coefficients.[^resistant-src]
Location estimates obtained via maximizing a iid normal likelihood over heavy tailed data will be sensitive to data in the tails (outliers).
A popular alternative to normal errors in regression analyses is the Student's $t$ density, with an unknown degrees of freedom parameter.
For low degrees of freedom, the Student's $t$ distribution has heavier tails than the normal, but tends to the normal as the degrees of freedom parameter increases.
Treating the degrees of freedom parameter as an unknown parameter to be estimated thus provides a check on the appropriateness of the normal.
By embedding a model with location parameters in the Student's $t$ density, we obtain outlier-resistant estimates of location parameters.

## Data

This example uses data collected by Douglas Grob on incumbency advantage in American congressional elections, 1956-1994 [@Jackman2000a].
```{r resistant}
data("resistant", package = "bayesjackman")
glimpse(resistant)
```

The response variable is the proportion of the two-party vote won by the Democratic candidate in district $i$ at election $t$.
Indicators for Democratic and Republican incumbency are the critical explanatory variables in the analysis.
Coefficients on these indicators are regarded as estimates of incumbency advantage.
A series of year-specific indicators (*fixed effects*) are also included in the specification.

$$
\begin{aligned}[t]
y_i &\sim \mathsf{StudentT}(\nu, \mu_i, \sigma) \\
\mu_i &= \alpha + x_i \beta
\end{aligned}
$$
The $\alpha$, $\beta$, and $\sigma$ parameters are given weakly informative priors (assuming that all $x$ are scaled to have mean 0, standard deviation 1):
$$
\begin{aligned}[t]
\alpha &\sim \mathsf{Normal}(\bar{y}, 10 s_y), \\
\beta &\sim \mathsf{Normal}(0, 2.5 s_y), \\
\sigma &\sim \mathsf{HalfCauchy}(0, 5 s_y) ,
\end{aligned}
$$
where $\bar{y}$ is the mean of $y$, and $s_y$ is the standard deviation of $y$.
The degrees of freedom parameter in the Student's $t$ distribution is a parameter and given the weakly informative prior suggested by @JuarezSteel2010a,
$$
\nu \sim \mathsf{Gamma}(2, 0.1) .
$$
The following code operationalizes this regression model.
The conditional density of the vote proportions is $t$, with unknown degrees of freedom, $\nu.$

```{r resistant_mod,results='hide',cache.extra=tools::md5sum("stan/resistant.stan")}
resistant_mod <- stan_model("stan/resistant.stan")
```

```{r echo=FALSE,results='asis',cache=FALSE}
resistant_mod
```

```{r resistant_data,cache.extra=tools::md5sum("data/resistant.rda")}
resistant_data <- within(list(), {
  y <- resistant$y
  N <- length(y)
  X <- model.matrix(~ 0 + lagy + prvwinpty + deminc + repinc, data = resistant) %>%
    scale()
  K <- ncol(X)
  year <- resistant$year
  Y <- max(year)
  # priors
  alpha_loc <- mean(y)
  alpha_scale <- 10 * sd(y)
  beta_loc <- rep(0, K)
  beta_scale <- rep(2.5 * sd(y), K)
  sigma_scale <- 5 * sd(y)
})
```

```{r resistant_fit,results='hide',cache=FALSE}
resistant_fit <- sampling(resistant_mod, data = resistant_data)
```
```{r}
summary(resistant_fit, par = c("nu", "sigma", "beta", "tau"))$summary
```

## Reparameterization: standard deviation instead of scale

In the Student's $t$ distribution, the standard deviation is a function of the degrees of freedom.
For degrees of freedom $\nu > 2$, the variance is defined, and
$$
\sigma^{*} = sd(y) = \sigma \sqrt{ \frac{\nu}{\nu - 2}}
$$
This makes the sampling of $\nu$ and $\sigma$ a priori dependent.
Instead, we can place priors on the degrees of freedom $\nu$ and the standard deviation $\sigma^*$,
and treat $\sigma$ as a transformed parameter,
$$
\begin{aligned}
\sigma^{*} &\sim \mathsf{HalfCauchy}{(0, 5)} \\
\sigma &= \sigma^{*} \sqrt{\frac{\nu - 2}{\nu}} \\
\end{aligned}
$$

```{r resistant_mod2,results='hide',cache.extra=tools::md5sum("stan/resistant2.stan")}
resistant_mod2 <- stan_model("stan/resistant2.stan")
```
```{r echo=FALSE,results='asis',cache=FALSE}
resistant_mod2
```

```{r resistant_fit2,results='hide'}
resistant_fit2 <- sampling(resistant_mod2, data = resistant_data)
```
```{r}
summary(resistant_fit2, par = c("beta", "sigma", "nu", "tau"))$summary
```

## Questions {-}

1.  How does using the Student-t distribution compare to using a normal distribution for the errors?
1.  Compare the effective sample size, $\hat{R}$, and speed for $\sigma$ and $\nu$ when using the scale/degrees of freedom and standard deviation/degrees of freedom parameterizations.

[^resistant-src]: Example derived from Simon Jackman, "Resistant: Outlier-resistant regression via the t distribution," 2007-07-24, [URL](https://web-beta.archive.org/web/20070724034107/http://jackman.stanford.edu:80/mcmc/resistant.odc).