bugs-examples-in-stan/uk92.Rmd at master · clarewtz/bugs-examples-in-stan · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# House of Commons elections: modeling with the multivariate Student-$t$ density {#uk92}

```{r uk92_setup,message=FALSE,cache=FALSE}
library("tidyverse")
library("rstan")
```

The data for this example consist of constituency vote proportions from the 1992 United Kingdom House of Commons election.
These data come from @KatzKing1999a, were re-analyzed @TomzTuckerWittenberg2002a.[^uk92-source]
This data is included in the **pscl** package as `UKHouseOfCommons`:
```{r UKHoseOfCommons}
(data("UKHouseOfCommons", package = "pscl"))
glimpse(UKHouseOfCommons)
```

The data consist of the vote proportions for 522 constituencies, for the three major UK parties: the Labor party, the Conservative Party, and the Liberal-Alliance.
Instead of working with the vote proportions directly, we will work with log-odds ratios.
This is common in the analysis of multinomial or "compositional" data [@Aitchison1982a].
The column `y1` is the log-odds of Conservative to the Liberal-Democratic vote share, while `y2` is the log-odds of Labor to the Liberal-Democratic vote share.

Let $y_{i,k}$, $k \in \{1, 2\}$, $i \in 1, \dots, N$ be the log-odds ratio vote share in constituency $i$.
@KatzKing1999a noted that the distribution of the log-odds ratios appear to be heavy-tailed relative to the normal.
Thus, like them, we will model the data with a multivariate Student's $t$ distribution with unknown degrees of freedom ($\nu$),
$$
\begin{aligned}[t]
y_i &\sim \mathsf{StudentT}(\nu, \alpha + x' \beta, \Sigma) & i \in 1, \dots, N,
\end{aligned}
$$

For identification, as in a logit regression, either the intercept or scale must be fixed. In this case, $\Sigma$ is a correlation matrix.

Weakly informative priors are used for the regression parameters.
The degrees of freedom of the multivariate Student t distribution is a parameter, and given a weakly informative Gamma distribution that puts most of the prior density between 3 and 40 [@JuarezSteel2010a],
$$
\begin{aligned}[t]
\alpha &\sim  \mathsf{Normal}(0, 10) , \\
\beta_p &\sim \mathsf{Normal}(0, 2.5), & p \in 1, \dots, P , \\
\Sigma &\sim \mathsf{LkjCorr}(\eta) , \\
\nu &\sim \mathsf{Gamma}(2, 0.1) .
\end{aligned}
$$

```{r UKHouseOfCommons}
(data("UKHouseOfCommons", package = "pscl"))
glimpse(UKHouseOfCommons)
```

```{r uk92_data}
uk92_data <- within(list(), {
  y <- as.matrix(dplyr::select(UKHouseOfCommons, y1, y2))
  X <- model.matrix(~ 0 + y1lag + y2lag + coninc + labinc + libinc, data = UKHouseOfCommons) %>% scale()
  N <- nrow(y)
  K <- ncol(y)
  P <- ncol(X)
  alpha_loc <- rep(0, K)
  alpha_scale <- rep(10, K)
  beta_loc <- matrix(0, K, P)
  beta_scale <- matrix(2.5, K, P)
  Sigma_corr_shape <- 2
  Sigma_scale_scale <- 5
})
```

```{r results='hide',cache.extra=tools::md5sum("stan/uk92.stan")}
uk92_mod <- stan_model("stan/uk92.stan")
```
```{r results='asis',echo=FALSE}
uk92_mod
```

Fit the model in Stan.
```{r uk92_fit,results='hide'}
uk92_fit <- sampling(uk92_mod, data = uk92_data, chains = 1)
```
```{r}
summary(uk92_fit, par = c("nu", "alpha", "beta", "Sigma"))$summary
```

## Questions

-   Given this model, replicate some of the results in @KatzKing1999a.
-   Model the data using a multivariate normal model instead. How do the results differ? Which fits the data better? What does the value of $\nu$ from the multivariate Student t model tell you about the plausibility of the multivariate normal distribution?
-   @TomzTuckerWittenberg2002a suggest using seemingly unrelated regressions (SUR). Model the data with SUR. How does it compare in results and speed?
-   Could you model this using a multinomial model with the data provided? What data would you need?

[^uk92-source]: Example derived from Simon Jackman, "House of Commons elections: modeling with the multivariate t density." *BUGS Examples* [URL](https://web-beta.archive.org/web/20070724034125/http://jackman.stanford.edu/mcmc/92.odc). Some language copied from the original.