05_analyzing_ba_increment.rmd

---
title: "BAI modeling"
---

```{r setup, include=FALSE}
knitr::knit_hooks$set(time_it = time_it)
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = TRUE, time_it = TRUE, cache.extra = "wombat")
```

```{r 05-analyzing-ba-increment-1}
library(lme4)
palette("Tableau 10")
library(patchwork)
```

# Introduction

Basal area increment is a key feature for analysis. Dbh data is available for almost all trees in both periods, it is more precise and diameter growth (as opposed to height growth) is expected to have a strong response to thinning treatments.

# Define trees of interest

For basal area increment, I will include all live trees without broken or dead tops. I am including trees identified as leaning, these should mostly have good growth and accurate measurements. I am excluding bear damaged trees. Increment for a period will be assigned to the beginning of period observation. This means I will want to exclude observations in which a tree is damaged by bear in this or a subsequent observation.

I am including trees with dead or broken tops when they didn't lose more than 25% of their initial height. This is only about 25 trees. There are many more with broken tops that are missing heights, but histograms of the diameter increment of these suggest that they are not that different from the set of trees that have heights.

I'm also filtering out observations that don't have crown ratio, this loses 1 redwood in the L40 treatment, dropping observations from 10 to 9, as well as losing 10 redwood in H80, but this is less of a problem. If CR proves not useful, I will remove this constraint.

I'm also removing trees with less than -0.2 (negative) diameter increment


```{r 05-analyzing-ba-increment-2}

test_d <- d_l %>%
  group_by(tree_id) %>%
  # only use unbroken live sese or psme from 2018
  filter(
    d_inc1 >= -0.2,
    # remove trees that lost more than 25% of height
    is.na(ht_inc1) | (ht_inc1 * 5 / ht) > -1/4,
    spp %in% c("SESE3", "PSMEM"),
    year %in% c("08", "13"),
    # Live and doesn't die by next period
    status == 1 & lead(status == 1),
    !is.na(dbh),
    # bear free this period and next
    !bear & lead(!bear),
    !is.na(cr)
  ) %>%
  mutate(
    treat_method = str_extract(treatment, "C|H|L"),
    treat_density = str_extract(treatment, "C|40|80"),
    treat_status = if_else(str_detect(treatment, "C"), "unthinned", "thinned"),
    year = factor(year, ordered = FALSE, levels = c("08", "13")),
    cr = cr / 100
  ) %>%
  select(-c(h_dist, azi, rot, cc, x, y, live)) %>%
  # add a scaled ba_inc1 variable for convenience
  ungroup() %>%
  mutate(ba_inc_scaled = (ba_inc1 - min(ba_inc1) + 1)) 

# Alter one outlier record in year 2013 by using data from previous year
big_tree <- "4H80.4019"
replacement_data <- filter(test_d, tree_id == big_tree, year == "13") %>%
  transmute(
    dbh = dbh,
    d_inc1 = d_inc2,
    ba = ba,
    ba_inc1 = ba_inc2,
  )
bad_rec <- with(test_d, which(tree_id == big_tree & year == "13"))
test_d[bad_rec, names(replacement_data)] <- replacement_data

display_vars <- function(x) {
  select(x, c(tree_id, year, dbh, ht_p, bear, cr, notes, cond, d_inc1, ba_inc1))
}
```

```{r 05-analyzing-ba-increment-3}

ggplot() +
  geom_density(
    data = filter(test_d, !is.na(ht_inc1)), 
    aes(x = d_inc1, fill = "heights available"), alpha = 0.5) +
  geom_density(
    data = filter(test_d, is.na(ht_inc1)),
    aes(x = d_inc1, fill = "heights NA"), alpha = 0.5) +
  # geom_histogram(
  #   data = filter(test_d, !get_cond(2, 3), !lead(get_cond(2, 3))),
  #   aes(fill = "unbroken"), alpha = 0.5)
  labs(x = "diameter increment", y = "Density", fill = NULL)

```

I also need to have a scaling value for making response non-negative for log transformation which will be defined as

`-(min(ba_inc1)) + 1`

for convenience, I computed a new response variable based on this value called `ba_inc_scaled`

```{r 05-analyzing-ba-increment-4}

scale_val <- -min(test_d$ba_inc1) + 1

```

Number of redwoods and Douglas fir trees in each plot and each treatment. I have a little problem with sample size when it comes to redwood. Some plots have zero or one observation. It seems like bears took a tole on the 

```{r 05-analyzing-ba-increment-5}

test_d %>%
  group_by(plot) %>%
  mutate(n_sese = sum(spp == "SESE3")) %>%
  group_by(treatment) %>%
  mutate(n_sese_treat = sum(spp == "SESE3")) %>%
  arrange(desc(n_sese_treat), desc(n_sese)) %>%
  ggplot(aes(fct_inorder(plot))) +
    geom_bar(aes(fill = spp)) +
    geom_text(
      aes(label = after_stat(count), group = spp),
      stat = "count",
      position = position_stack(vjust = .5)
    ) +
    facet_wrap(~year) +
    coord_flip() +
    scale_fill_manual(values = palette()) +
    labs(x = "Plots, nested within treatments")

test_d %>%
  group_by(plot) %>%
  mutate(n_sese = sum(spp == "SESE3")) %>%
  group_by(treatment) %>%
  mutate(n_sese_treat = sum(spp == "SESE3")) %>%
  arrange(desc(n_sese_treat), desc(n_sese)) %>%
  ggplot(aes(fct_reorder(treatment, n_sese, .desc = TRUE))) +
    geom_bar(aes(fill = spp)) +
    geom_text(
      aes(label = after_stat(count),
      group = spp),
      stat = "count",
      position = position_stack(vjust = .5)
    ) +
    facet_wrap(~year) +
    coord_flip() +
    scale_fill_manual(values = palette())

```

# Data exploration

## Outliers

Here I'm looking for any outliers in basal area increment, or extreme changes. There are definitely basal area outliers some of the most extreme can be explained by having bear damage, but on average, bear-damaged trees have diameter increments almost double those of non-bear-damaged. It is unclear whether this is due to loose bark, or the fact that these trees tend to be faster growing. This question might need further investigation. Following Dagley et. al 2018, I will remove bear damaged trees from analysis. Another option could be including bear damage as a covariate.

*Note, I have already removed increments less than -0.2 and adjusted one large tree with extrememe growth using its first increment*

```{r bainc6, cache=TRUE}

# outliers defined as 1.5 times above or below IQR
test_d %>%
  filter(
    out(ba_inc1)
  ) %>%
  group_by(tree_id) %>%
  arrange(ba_inc1) %>%
  display_vars %>%
  color_groups()

# dot plot of outliers
test_d %>%
  ggplot(aes(x = ba_inc1, y = fct_reorder(tree_id, d_inc1))) +
    geom_line(aes(group = tree_id), alpha = .3) +
    geom_point(aes(color = year), alpha = .5) +
    scale_y_discrete(breaks = NULL, expand = expansion(add = 10)) +
    scale_color_manual(values = palette()) +
    guides(color = guide_legend(override.aes = list(shape = c(19, 19)))) +
    labs(
      x = "BA increment (cm)",
      y = "tree id (sorted by diameter increment)",
      title = "data in dataset"
    )

```

To put outliers in terms of ba_inc1 in context, if we look at the full dataset, including bear damaged trees, and plot *diameter* increment instead of basal area, then we see a more consistent gradient of diameter measurements.

```{r bainc7, cache=TRUE}

d_l %>%
  filter(year %in% c("08", "13"), live) %>%
  ggplot(aes(x = d_inc1, y = fct_reorder(tree_id, d_inc1))) +
    geom_line(aes(group = tree_id), alpha = .3) +
    geom_point(aes(color = year), alpha = .5) +
    scale_y_discrete(breaks = NULL, expand = expansion(add = 10)) +
    scale_color_manual(values = palette()) +
    guides(color = guide_legend(override.aes = list(shape = c(19, 19)))) +
    labs(
      x = "diameter increment (cm)",
      y = "tree id ALL TREES (sorted by diameter increment)",
      title = "all data"
    )

```

## Negative values

Here is a summary of the remaining negative diameter increment values.

```{r 05-analyzing-ba-increment-6}

local({
  neg_ba_inc <- test_d %>% 
    filter(ba_inc1 < 0) %>%
    arrange(d_inc1) %>%
    pull(d_inc1) 
  print(paste("N: ", length(neg_ba_inc)))
  summary(neg_ba_inc)
})

```

## Crown ratio

I'm going to test cr in the model, it doesn't lose too many more observations by including it, if it doesn't seem to be worth it, I might include those few observations back in the data.

```{r 05-analyzing-ba-increment-7}

test_d %>%
  ggplot(aes(x = cr, fill = spp)) +
    geom_bar() +
    facet_wrap(~ year)

```

## Plotting data

First I'll look at distributions of the response:

```{r 05-analyzing-ba-increment-8}

hist(test_d$ba_inc1, main = "Basal area increment")
hist(log(test_d$ba_inc_scaled), main = "Log of basal area increment + constant")
with(test_d, plot(dbh, ba_inc1))

hist(test_d$dbh, main = "DBHt")
hist(log(test_d$dbh + 18), main = "Log of DBH")

```

relationship of ba_inc1 with dbh by species and year is not exactly linear, I will use a transformation or a GLM with a different distribution.

```{r 05-analyzing-ba-increment-9}

test_d %>%
  ggplot(aes(x = dbh, y = ba_inc1, color = spp)) +
  geom_point(alpha = 0.3) +
  facet_wrap(~ year) +
  scale_color_manual(values = palette())

test_d %>%
  ggplot(aes(x = (dbh), y = log(ba_inc_scaled), color = spp)) +
  geom_point(alpha = 0.3) +
  facet_wrap(~ year) +
  scale_color_manual(values = palette())

test_d %>%
  ggplot(aes(x = log(dbh), y = (ba_inc1), color = spp)) +
  geom_point(alpha = 0.3) +
  facet_wrap(~ year) +
  scale_color_manual(values = palette())

test_d %>%
  ggplot(aes(x = log(dbh), y = log(ba_inc_scaled), color = spp)) +
  geom_point(alpha = 0.3) +
  facet_wrap(~ year) +
  scale_color_manual(values = palette())

```

does crown ratio have a relationship with diameter increment?

```{r 05-analyzing-ba-increment-10}

test_d %>%
  ggplot(aes(cr, d_inc1), color = spp) +
  geom_point() +
  geom_smooth(method = "lm") +
  ggpubr::stat_cor()

```

## Evidence that my response is not normally distributed?

looking at 2-inch diameter classes, it would seem that ba_inc is not normally distributed by dbh.

```{r 05-analyzing-ba-increment-11}
d_l %>%
  filter(live, !bear, year %in% c("08", "13")) %>%
  mutate(dbh_cut = cut_interval(dbh, length = 2)) %>%
  group_by(dbh_cut) %>%
  filter(n() > 100) %>%
  ggplot(aes((ba_inc1))) +
    geom_histogram() +
    facet_wrap(~ dbh_cut, scales = "free")
```

## Change in ba-inc over years at tree level

We can see that in the control, relatively few redwood decreased in size. Few trees over all decreased in size in the H40 and L40 treatments, In the L80 treatment, few redwoods decreased in size, and many fir did. In the intense treatments (40's) majority of trees increased in growth rate. Overall, there seems to be a species * treatment * year interaction

```{r 05-analyzing-ba-increment-12, fig.height=8, fig.width=9}
test_d %>%
  filter(ba_inc1 < 150) %>%
  group_by(tree_id) %>%
  mutate(
    increase = if_else(
      any(ba_inc1 > lag(ba_inc1), na.rm = TRUE),
      "BA inc. incresed",
      "BA inc. decreased"
      )
   ) %>%
  ggplot(aes(year, ba_inc1, color = increase, group = tree_id)) +
    geom_line(size = .8, alpha = .4) +
    facet_wrap(~ treatment + spp) +
    scale_x_discrete(expand = expansion(mult = 0.1)) +
    scale_y_continuous(expand = expansion(mult = 0.01)) +
    theme(legend.position = "bottom") +
    labs(color = NULL)

```

And what about at the plot level, can I discern any interactions between species, year and plot?

```{r 05-analyzing-ba-increment-13, fig.height=6, fig.width=10}
plot_bainc_change <- function(treatment) {
  test_d %>%
    filter(ba_inc1 < 150, treatment == {{treatment}}) %>%
    group_by(tree_id) %>%
    mutate(increase = any(ba_inc1 > lag(ba_inc1), na.rm = TRUE)) %>%
    ggplot(aes(year, ba_inc1, color = increase, group = tree_id)) +
      theme(legend.position = "bottom") +
      geom_line(size = .8, alpha = .7) +
      facet_grid(rows = vars(spp), cols = vars(plot)) +
      scale_x_discrete(expand = expansion(mult = 0.1)) +
      scale_y_continuous(expand = expansion(mult = 0.01))
}


```

# Modeling

Is year significant for basal area increment? T-Test? It looks like year is close to significant for the higher intensity treatments.

```{r 05-analyzing-ba-increment-14}

test_d %>%
  group_by(treatment) %>%
  select(year, ba_inc1) %>%
  summarize(as.data.frame(t.test(ba_inc1 ~ year)[c("estimate", "parameter", "p.value")]))

```

## Basic model forms

Basal area increment is expected to vary by dbh and crown ratio. We hope to see a treatment effect. Because of the interaction with year, I will try to model for the most recent increment period only.

```
model a: scaled_ba_inc1 = exp((dbh) * treatment * spp), gamma distribution
model b: scaled_ba_inc1 = exp((dbh) * treatment * spp), gaussian distribution
model c: log(ba_inc1) = dbh * treatment * spp 
model d: ba_inc1 = (log(dbh) + dbh) * treatment * spp
```
The scaling value was computed as `-(min(ba_inc1)) + 1`


```{r 05-analyzing-ba-increment-15, fig.height=7, fig.width=10}

m6a <- glm(
  ba_inc_scaled ~ dbh * treatment * spp,
  family = Gamma(link = "log"),
  data = test_d
)

m6b <- glm(
  ba_inc_scaled ~ dbh * treatment * spp,
  family = gaussian(link = "log"),
  data = test_d
)

m6c <- lm(log(ba_inc_scaled) ~ dbh * treatment * spp, data = test_d)

m6d <- lm(log(ba_inc_scaled) ~ log(dbh) * treatment * spp, data = test_d)

```

A and B are comparable, and C and D are comparable. E is not comparable with anything.

```{r 05-analyzing-ba-increment-16}
AIC(m6a, m6b, m6c, m6d)
```


```{r 05-analyzing-ba-increment-17}

test_d %>%
  ggplot(aes(x = dbh, y = ba_inc1 + scale_val, color = spp)) +
    geom_point(alpha = .2) +
    facet_wrap(vars(treatment)) +
    geom_smooth(
      method = "glm",
      formula = y ~ x,
      method.args = list(family = Gamma(link = "log"))
    ) +
    scale_color_manual(values = palette()) +
    labs(title = "Gamma glm wih log link")

test_d %>%
  ggplot(aes(x = dbh, y = ba_inc1 + scale_val, color = spp)) +
    geom_point(alpha = .2) +
    facet_wrap(vars(treatment)) +
    geom_smooth(
      method = "glm",
      formula = y ~ x,
      method.args = list(family = gaussian(link = "log"))
    ) +
    scale_color_manual(values = palette()) +
    labs(title = "Gaussian glm wih log link")

test_d %>%
  ggplot(aes(x = dbh, y = log(ba_inc_scaled), color = spp)) +
    geom_point(alpha = .2) +
    facet_wrap(vars(treatment)) +
    geom_smooth(method = "lm", formula = y ~ x) +
    scale_color_manual(values = palette()) +
    labs(title = "Linear model, log-transformed")

test_d %>%
  ggplot(aes(x = log(dbh), y = log(ba_inc_scaled), color = spp)) +
    geom_point(alpha = .2) +
    facet_wrap(vars(treatment)) +
    geom_smooth(method = "lm", formula = y ~ x) +
    scale_color_manual(values = palette()) +
    labs(title = "Linear model, log-log-transformed")

```

## Moving forward with log - log model: Mixed effects

So, if my gamma GLM is not going to work, I will focus instead on using a mixed effect linear model. First, I'll determine random effects using full model.

```{r 05-analyzing-ba-increment-18}

f0 <- formula(
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + spp + year * cr
)

m1 <- lmer(update(f0, ~ . + (1 | tree_id) + (1 | plot)), REML = FALSE, data = test_d)
m2 <- update(m1, ~ . - (1 | plot))
m3 <- update(m1, ~ . - (1 | tree_id))
m0 <- lm(f0, data = test_d)

AICc(m0, m1, m2, m3)


```

Including plot in the random effects results in a singular fit and no random intercepts are calculated for plot, rendering it useless. This could mean that after accounting for fixed effects, there is no discernable variation in plot, or it could mean that there is some other problem. Either way, I don't think it makes any sense to include it right now.

`log_ba_inc_scaled ~ fixed_effects + (1 | tree_id)`

## log-log model fixed effects

```{r 05-analyzing-ba-increment-19}

fl1 <- list(
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + year + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + cr + year + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + cr + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + year + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + year + spp * treatment + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + cr * year + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + cr + year + spp * treatment + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) * treatment + cr + year + spp + year:spp +(1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment + year + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment + year + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment * year * spp + cr + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment + cr * year * spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment + cr * year + spp + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment + cr * year + spp * treatment + (1 | tree_id),
  log(ba_inc1 + scale_val) ~ log(dbh) + treatment + cr * year * spp + treatment:spp + (1 | tree_id)
)

aic_weights(fl1, data = test_d, method = "lmm", delta_max = Inf) %>%
  transmute(
    Row = row,
    Formula = formula,
    AICc = round(aicc, 1),
    RMSE = round(rmse, 3),
    "Delta AIC" = round(delta, 1),
    "AIC weight" = round(wi, 3),
    "Marginal/conditional R^2^" = r2,
  ) %>%
  kbl2()

```

Here I'll look at the top model, plot labels refer to diameter increment (cm/year). Unfortunately the residuals do not appear to be evenly distributed. I'm not sure that I can use this model.

```{r 05-analyzing-ba-increment-20}
ba_inc_lmm <- lmer(fl1[[18]], data = test_d, REML = TRUE)

summary(ba_inc_lmm)

residual_plot <- function(mod, val = 1) {
  resid <- resid(mod, type = "deviance")
  plot(
    predict(mod), resid,
    main = "residual vs fitted",
    sub = "labels are diameter increment (cm/yr)",
    xlab = "log(ba_inc) prediction",
    ylab = "deviance residaul"
  )
  outliers <- which(abs(resid) > val)
  label <- round(test_d$d_inc1[outliers], 3)
  text(predict(mod)[outliers], resid[outliers] + 0.1, label, col = "firebrick")
}

residual_plot(ba_inc_lmm)
```

see:
[Cross validated post about zero variance](https://stats.stackexchange.com/questions/115090/why-do-i-get-zero-variance-of-a-random-effect-in-my-mixed-model-despite-some-va)

And the singular models section [here](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#singular-models-random-effect-variances-estimated-as-zero-or-correlations-estimated-as---1)

Here I can include the random effect of plot if I have to with plot random effect.

```{r 05-analyzing-ba-increment-21, eval=FALSE}

fl2 <- lapply(fl1, update_no_simplify, ~ . + (1 | plot))
get_aic(fl2, data = test_d, method = "lmm")

```

## What does all model selection have to say?

I could use dredge from MuMIn to test all possible models with 3 to 10 predictors models

```{r 05-analyzing-ba-increment-22, eval=FALSE}

options(na.action = "na.fail")

fm1 <- lmer(
  log(ba_inc_scaled) ~ log(dbh) * treatment + cr * year * spp + (1 | tree_id),
  data = test_d
)

dd <- dredge(fm1, m.lim = c(3, 10))

dd
```

# Estimated marginal means and regression table of selected model

First I choose a model from above based on parsimony and AICc:

**Using model 18 above**

```{r 05-analyzing-ba-increment-23}

ba_inc_mod <- ba_inc_lmm

summary(ba_inc_mod)

```

```{r 05-analyzing-ba-increment-24}

sjPlot::tab_model(ba_inc_mod,
  dv.labels = "BA increment",
  show.ci = FALSE,
  show.se = TRUE,
  show.icc = FALSE
)

```

Here I get the marginal estimates and pairwise comparisons for the selected model. First I do a pairwise comparison on treatments.

```{r 05-analyzing-ba-increment-25}

ba_inc_ref <- ref_grid(ba_inc_mod)

emmeans(ba_inc_ref, pairwise ~ treatment, type = "response")

```

Then I compare our two species of interest

```{r 05-analyzing-ba-increment-26}

emmeans(ba_inc_ref, pairwise ~ spp, type = "response")

```

This visualization shows that most pairwise comparisons of species and treatment combinations are significantly different.

```{r 05-analyzing-ba-increment-27}

emmeans(ba_inc_ref, ~ treatment + spp) %>% pwpp(type = "response")

emmeans(ba_inc_ref, pairwise ~ treatment + spp, type = "response")

```

Here I plot the expected response by species and dbh, averaged over the effect of year and add an inset for the interaction effect of crown ratio and year. For this I define two more reference grids, one with a range of dbh to include and one with a range of cr to include

```{r bainc-fig, cache=TRUE, fig.width=7.16, fig.height=5.91}

ba_inc_ref_dbh <- ref_grid(
  ba_inc_mod,
  at = list(dbh = seq(10, 60, 2))
)

ba_inc_ref_cr <- ref_grid(
  ba_inc_mod,
  at = list(cr = seq(.05, .65, .1))
)

a <- emmip(
  ba_inc_ref_dbh,
  spp ~ dbh | treatment,
  CIs = TRUE,
  plotit = FALSE,
  type = "response"
) %>%
  relevel_treatment() %>%
  mutate(
    spp = recode(spp, SESE3 = "Redwood", PSMEM = "Douglas-fir")
  ) %>%
  ggplot(aes(dbh, yvar, color = spp)) +
    geom_line() +
    facet_wrap(~ treatment ) +
    geom_ribbon(aes(ymin = LCL, ymax = UCL, color = NULL, fill = spp), alpha = .2) +
    theme_bw() +
    theme(
      legend.position = c(0.08, 0.9),
      legend.background = element_blank(),
      legend.title = element_blank()
    ) +
    scale_color_manual(
      values = c("#969696", "black"),
      aesthetics = c("color", "fill")
    ) +
    labs(
      x = "DBH (cm)",
      y = expression(BAI ~ (cm^2 ~ year^-1)) 
    # title = "back transformed from log-log model, averaged over year"
    )

b <- emmip(
  ba_inc_ref_cr,
  year ~ cr,
  CIs = TRUE,
  plotit = FALSE,
  type = "response"
) %>%
  mutate(year = fct_recode(year, "2008 - 2013" = "08", "2013 - 2018" = "13")) %>%
  ggplot(aes(cr, yvar, color = year)) +
    geom_line() +
    # facet_wrap(~ treatment ) +
    geom_ribbon(aes(ymin = LCL, ymax = UCL, color = NULL, fill = year), alpha = .2) +
    theme_bw() +
    theme(
      legend.position = c(0.35, 0.75),
      legend.background = element_blank(),
      legend.title = element_blank(),
      axis.title.y = element_text(margin = margin(b = 0), vjust = 0.01),
      plot.margin = unit(c(2, 2, 2, 1), "mm")
    ) +
    scale_color_manual(
      values = c("#969696", "black"),
      aesthetics = c("color", "fill")
    ) +
    scale_y_continuous(limits = c(0, 150)) +
    labs(
      x = "Crown ratio",
      y = expression(BAI ~ (cm^2 ~ year^-1))
      # title = "Effect of interaction of year and crown-ratio on BA increment"
    )

    
layout <- c(
  patchwork::area(1, 1, 200, 300),
  patchwork::area(108, 219, 200, 300)
)

a + b + plot_layout(design = layout)

ggsave(
  filename = "figs/bai_prediction.pdf",
  device = cairo_pdf,
  width = 18.2,
  height = 13,
  units = "cm"
)

ggsave(
  filename = "figs/bai_prediction.jpg",
  width = 18.2,
  height = 13,
  units = "cm"
)

```

Now we show the estimated mean BAI response of a tree of average size, with average CR and for an average year, and identify which species/treatment combinations vary significantly from each other.

```{r 05-analyzing-ba-increment-28}

dodge <- position_dodge(width = 0.9)

emmeans(ba_inc_ref, ~ treatment + spp, type = "response") %>% 
  multcomp::cld(Letters = letters, decreasing = TRUE) %>%
  mutate(.group = gsub(" +", "", .group)) %>%
  relevel_treatment() %>%
  ggplot(aes(treatment, response, fill = spp)) +
    geom_errorbar(
      aes(ymin = lower.CL, ymax = upper.CL),
      position = dodge,
      width = 0.2
    ) +
    geom_col(position = dodge) +
    theme_bw() +
    theme(
      legend.position = c(0.8, 0.8),
      legend.background = element_blank()
    ) +
    scale_fill_manual(
      values = c("#969696", "black"),
      labels = c("Douglas-fir", "Redwood")
    ) + 
    geom_text(
      aes(y = upper.CL, label = .group),
      position = dodge,
      vjust = -0.5
    ) +
    scale_y_continuous(expand = c(0.0, 0, 0.08, 0)) +
    labs(y = expression(BAI ~ (cm^2 ~ year^-1)), x = "Treatment", fill = NULL)

ggsave(
  filename = "figs/bai_mean_compare.pdf",
  device = cairo_pdf,
  width = 8.84,
  height = 9,
  units = "cm"
)

ggsave(
  filename = "figs/bai_mean_compare.jpg",
  width = 8.84,
  height = 9,
  units = "cm"
)

```

# Model validation


```{r 05-analyzing-ba-increment-29}
# augment data with fitted, residual cooks distance and leverage

augment1 <- function(dat, mod) {
  dat %>% mutate(
    fitted = fitted(mod),
    resid = resid(mod, type = "pearson"),
    cooks = cooks.distance(mod),
    lev = hatvalues(mod)
  )
}

ba_inc_d <- augment1(test_d, ba_inc_mod)

```

## Residual vs fitted

I also plot residual vs the predictor and explanatory variables

```{r 05-analyzing-ba-increment-30}
plot(
  resid ~ fitted,
  data = ba_inc_d,
  pch = 16,
  xlab = "fitted values",
  ylab = "Scaled residuals",
  main = "Residual vs fitted, colored by treatment",
  col = as.factor(treatment)
)
abline(0,0)

plot(
  resid ~ log(dbh),
  data = ba_inc_d,
  xlab = "log(dbh)",
  ylab = "Scaled residuals",
  main = "Residual vs log(dbh)",
  col = 2,
  pch = 16
)
abline(0,0)

plot(
  resid ~ cr,
  data = ba_inc_d,
  xlab = "crown ratio",
  ylab = "Scaled residuals",
  main = "Residual vs crown ratio, colored by species",
  col = as.factor(spp),
  pch = 16
)
abline(0,0)

```

## Homogeneity of random group residuals

I can check that random group residuals are homogenous.

I can't really check if residuals are homogenous for each tree, because there are over a thousand of them.

```{r 05-analyzing-ba-increment-31}

plot(
  ba_inc_mod,
  resid(., scaled=TRUE) ~ fitted(.)| plot,
  abline = 0,
  pch = 16,
  xlab = "Fitted values",
  ylab = "Standardised residuals"
)

```


## Normality of residuals

It seems we are violating the assumption of normality of residuals here.

```{r 05-analyzing-ba-increment-32, fig.height=7, fig.width=10}
qqnorm(resid(ba_inc_mod, type = "pearson"), pch=16, col = 1,  main = "QQplot for pearosn residuals")
qqline(resid(ba_inc_mod, type = "pearson"))
```

## Leverage and Cooks distance

Cooks outliers, defined as > 3 x mean(cooks distance) are colored in red. None of these "outliers" seem like they would disproportionately affect regression. 

```{r 05-analyzing-ba-increment-33, fig.width=10, fig.height=8}

# Plot leverage against standardised residuals

plot(
  resid ~ lev,
  data = ba_inc_d,
  las = 1,
  ylab = "Standardised residuals",
  xlab = "Leverage",
  col = palette.colors(palette = "tableau10", alpha = .5)[factor(year)],
  main = "Leverage vs residuals",
  pch = 16
)

points(resid ~ lev, data = filter(ba_inc_d, cooks > 6 * mean(cooks)), pch = 16, col =3)

plot(ba_inc_d$lev,pch=16,col="red",ylim=c(0,2.75),las=1,ylab="Leverage/Cook's distance value")
points(ba_inc_d$cooks,pch=17,col="blue")
legend(100, 2.5, c("Leverage", "Cook's distance"), col = c("red", "blue"), pch = c(16, 17))
```

## Random effects distribution

The distribution of random effects (plots) should be roughly normal. Here our distribution is skewed, [from what I've read](https://stats.stackexchange.com/a/366500), this should not be a problem for estimates or their std's.

```{r 05-analyzing-ba-increment-34}
hist(as.vector(unlist(ranef(ba_inc_mod)$tree_id)), col = 1)
```

# Is a glm going to work?

While GLMs with gamma family are used in the literature for basal area increment models. I had a hard time getting them to converge. They are computationally expensive (slow) to fit and when I look at the residuals, they are not all the different from the log-log model fit above.

In looking at the results for the GLM model, it looks very similar to the linear log log model, the qq plot reveals similarly light tailed distribution.

```{r 05-analyzing-ba-increment-35, eval=FALSE}

glm_mod <- glmer(
  I(ba_inc1 + scale_val) ~ treatment + scale(dbh) + scale(cr) + spp + year + (1 | tree_id),
  data = test_d,
  family = Gamma(link = "log")
)

allFit(glm_mod)

glm_converge <- mixedup::converge_it(glm_mod)

summary(glm_mod)

emmeans(glm_converge, ~ dbh + spp + treatment, at = list(dbh = seq(10, 60, 5)), type = "response") |>
  summary() |>
  as.data.frame() |>
  ggplot(aes(dbh, response, color = spp)) +
    geom_line() +
    facet_wrap(~treatment) +
    geom_point(data = test_d, aes(dbh, ba_inc1))


plot(predict(glm_mod), resid(glm_mod, type = "deviance"))

```