09_damage.rmd

---
title: Bear and other damage
bibliography: ["./reference/a972.bib"]
---

```{r setup, include=FALSE}
knitr::knit_hooks$set(time_it = time_it)
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, time_it = TRUE, cache.extra = "wombat")
```

```{r 09-damage-1}
print(knitr::current_input())
```

```{r 09-damage-2}
require(tidyverse)
library(performance)
```

# Bear damage

## data cleaning

I previously only looked at live trees, this underestimates damage because observations are ommited from counts of cumulative damage. I'm now including all tree records in the analysis, so estimates represent total amounts of post-treatment trees damaged by bears, some of these trees may no longer be in the stand, but they were still damaged by bears at one point.

```{r 09-damage-3}
d_l$year <- factor(d_l$year, ordered = FALSE)
bd <- d_l
```

There are trees with "healed over" in the notes, most of these are in 2008. It makes sense if these trees are subsequently listed as not bear damaged. 

I'm assuming that any tree that goes from bear damaged in 2008 to not bear damaged in 2013 is in fact healed and that any damage in 2018 is new damage.

```{r 09-damage-4}

# # Use this to look at any "healed" trees
# bd %>% 
#   group_by(tree_id) %>%
#   filter(any(str_detect(tolower(notes), "healed"))) %>%
#   color_groups()

```

In looking at notes, trees trees that have "old bear damage" recorded are treated inconsistently, some are recorded with bear damage, some without. I'll assume that trees recorded as not bear damaged are either undamaged or completely healed, and subsequent damage implies a new bear incidence of bear damage.

```{r 09-damage-5}

# # Use this to look at any "old bd" trees
# bd %>% 
#   group_by(tree_id) %>%
#   filter(any(str_detect(tolower(notes), "old"))) %>%
#   color_groups()

```

Are there trees that are recorded as bear damaged in one period and then not in the next period? Put another way, are bear damaged trees dropped from the list for one reason or another? 

There are 77 trees that are dropped (2013 and 2018), of these, 14 are subsequently listed as (re-) damaged (all in 2018). As stated above, I will consider these valid occurrences of new damage. 16 of the "dropped" trees were because of death (no longer recording bear damage).

```{r 09-damage-6}

bear_dropped <- bd %>%
  group_by(tree_id) %>%
  mutate(bear_dropped = lag(bear) & !bear) %>%
  filter(any(bear_dropped)) %>%
  mutate(id = cur_group_id()) %>%
  relocate(id) %>%
  arrange(id)

# # all dropped bear damage trees occur in 2013
# filter(bear_dropped, bear_dropped) %>% pull(year) %>% unique()

# color_groups(bear_dropped)

# trees that are re-attacked
bear_dropped %>%
  filter(any(bear & !lag(bear))) %>%
  mutate(id = cur_group_id(), .before = 1) %>%
  color_groups()

```

I need to create another variable which indicates if the damage is new for that period, when a trees goes from undamaged to damaged. I will also count trees as new bear damage when the damage increases from one period to the next ie. when condition code increases from 17 or 18 to 19 or 20.

I'll also add a variable indicating whether a tree was damaged in 2013

```{r 09-damage-7}

# this was used for ensuring all bear damage stuck with a tree throughout its life
# I've since decided to allow trees to "completely heal," as the data seems to suggest this
cum_logic <- function(x) {
  if (any(x)) {
    idx <- min(which(x))
    x[idx:length(x)] <- TRUE
  }
  return(x)
}

make_bd <- function(data) {
  data |>
    group_by(tree_id) %>%
    mutate(
      bear_mag = as.numeric(get_cond(17, 18, 19, str = TRUE)),
      bear_new = bear & year %in% c("init", "08") | bear & !lag(bear, order_by = year) | bear_mag > lag(bear_mag, order_by = year),
      bear_new = if_else(is.na(bear_new), FALSE, bear_new),
      bear_cum = if_else(year == "18" & lag(bear_new, order_by = year), TRUE, bear_new),
      spp2 = spp
    ) %>%
    select(-c(bear_mag, h_dist, azi, x, y)) %>%
    ungroup()
}

```

I want to include all trees present after treatment. 38 trees are "lost" due to being dead and downed so I need to re-add them manually as dead trees. I show the difference between this and the incomplete treelist further on.

I'll make two datasets, one "complete" with all tag records for all periods, and another with the observed records only, so I can compare the analysis of each.

```{r 09-damage-8}

dam_all <- d_l |> 
  filter(year != "init") |> 
  droplevels() |>
  complete(nesting(tree_id, plot, treatment, spp), year) |>
  anti_join(filter(d_l, year != "init")) |>
  mutate(live = FALSE) |>
  bind_rows(d_l) |>
  mutate(year = fct_relevel(year, "init"))


# this is the complete dataset
bd_all <-  make_bd(dam_all)

# This is the naieve dataset w/o dead downed trees (not recorded)
bd1 <- make_bd(d_l)

```

There were about 30 spruce trees before thinning and about half of these had bear damage. In 2013, there were 24 spruce and 4 of them received damage.

Trees killed just by bears.

Bear damaged trees that were alive in 2008 and subsequently died without being broken or uprooted

```{r 09-damage-9}

bd |>
  group_by(tree_id) |>
  filter( any((bear | lag(bear)) & !live & lag(live) & !get_cond(30, 31, 32)) ) |>
  pull(tree_id) |>
  n_distinct()

```

function for plotting data at the treatment level

```{r 09-damage-10}

bear_plot <- function(data, var, type) {
  if (type == "cnt") .fun <- sum
  if (type == "pct") .fun <- mean
  my_dodge = position_dodge(width = 0.5)
  data |>
    group_by(year, treatment, plot) %>% 
    summarize(plot_sum = .fun( {{var}} )) %>% 
    summarize(avg = mean(plot_sum), se = sd(plot_sum) / sqrt(n()) ) %>% 
    ggplot(aes(year, avg, color = treatment, group = treatment)) + 
      geom_line(position = my_dodge) +
      geom_point(position = my_dodge) +
      geom_errorbar(aes(ymin = avg - se, ymax = avg + se), width = 0.2, position = my_dodge)
  }

  
```

function for plotting data at the plot level

```{r 09-damage-11}

bd_plot_fig <- function(data, var = bear_new, prop = FALSE) {
  my_sum <- function(d) summarise(d, var = sum( {{var}} ))
  lab_word <- "Count"
  if (prop) {
    my_sum <- function(d) summarise(d, var = sum( {{var}} ) / n())
    lab_word <- "Proportion"
  }
  data %>%
    group_by(year, treatment, plot) %>% 
    my_sum %>% 
    ggplot(aes(year, var, color = treatment, group = plot)) + 
      geom_line(position = position_dodge(width = 0.4), size = 1, alpha = 0.6) +
      facet_wrap(~ treatment) +
      theme(legend.position = "none") +
      geom_point(position = position_dodge(width = 0.4)) +
      scale_x_discrete(expand = expansion(mult = 0.2)) +
      labs(
        title = paste(lab_word, "new bear damage for each treatment and plot"),
        y = lab_word
      )
}

```

bar-graph function for plotting species distribution at treatment level, second function shows difference between aggregating at treatment level and averaging across plots (probably how it should be done).

```{r 09-damage-12}

dam_plot <- function(d, var, type) {
  if (type == "pct") {
    my_sum <- function(d) {
      d |>
      summarize(var_true = sum( {{var}} ), n = n()) |>
      # species component damage of all trees in plot
      mutate(var_true = var_true / sum(n))
    }
  } else if (type == "cnt") {
    my_sum <- function(d) {
      d |> summarise(var_true = sum( {{var}} ))
    }
  }
  d |>
  group_by(treatment, year, plot, spp2) |>
  my_sum() |>
  ungroup()
}

# This funtion is using "spp2" instead of "spp", this was originally to allow grouping of species


# this function computes total bar height (average) and SE
treat_bar <- function(...) {
  dam_plot(...) |>
    group_by(treatment, year, plot) |>
    summarize(var1 = sum(var_true)) |>
    summarise(avg = mean(var1), SE = sd(var1) / sqrt(n())) |>
    relevel_treatment()
} 

# This function computes the species component of bar height, the
# sum of which is the same as above.
spp_bar <- function(...) {
  dam_plot(...) |>
  group_by(treatment, year, spp2) |>
  # hard-coding "4" was necessary because not every species is present
  # in every plot, so zeros for some species in some plots need to be implied.
  summarise(avg_spp = sum(var_true) / 4) |>
  filter(avg_spp > 0) |>
  relevel_treatment()
}

dam_bar1 <- function(d, var, type, title, error = FALSE, nospruce = FALSE) {
  
  spp_d <- spp_bar(d = d, var = {{var}}, type = type)
  if(nospruce) spp_d <- filter(spp_d, spp2 != "PISI")
  err_d <- treat_bar(d, var = {{var}}, type = type)

  ggplot(spp_d, aes(x = treatment)) +
    geom_col(aes(y = avg_spp, fill = spp2), position = "stack", color = "black") +
    {if (error) geom_errorbar(data = err_d, aes(y = avg, ymin = avg - SE, ymax = avg + SE), width = 0.2)} +
    facet_wrap(~ year, nrow = 1) +
    scale_fill_manual(values = palette()) +
    labs(title = title, y = type, x = "Treatment", fill = NULL)

}

# Naive way (aggregating at treatment level) ###################################
dam_bar0 <- function(data, var, type = "cnt", title = "") {
  if (type == "cnt") {
    my_sum <- function(d) {
    summarise(d, var = sum( {{var}} ))
    } 
  } else if (type == "pct") {
    my_sum <- function(d) {
      summarise(d, var = sum( {{var}} ), ntree = n()) |>
      mutate(var = var / sum(ntree))
    } 
  }

  data |>  
    group_by(treatment, year, spp2) |>
    my_sum() |>
    filter(var > 0) |>
    relevel_treatment() |>
    ggplot(aes(treatment, var, fill = spp2)) +
      geom_col(position = "stack", color = "black") +
      facet_wrap(~ year, nrow = 1) +
      scale_fill_manual(values = palette()) +
      labs(title = title, y = type, x = "Treatment", fill = NULL)
}
```

# The problem with the bear bars

Starting with the only observed trees data (no dead downed tags), first, there is a subtle difference between aggregating at the treatment level (simple sum of bear damage across plots), and averaging plot level sums.

```{r 09-damage-13, animation.hook="gifski", interval=1.7}

dam_bar0(bd1, bear_new, type = "pct",
  title = "Aggregated percent new bear damage pre and post treatment") +
  scale_y_continuous(limits = c(0, .21))

dam_bar1(bd1, bear_new, type = "pct",
  title = "Average percent new bear damage pre and post treatment")+
  scale_y_continuous(limits = c(0, .21))
```

```{r 09-damage-14, animation.hook="gifski", interval=1.7}

dam_bar0(bd1, bear_cum, type = "pct",
  title = "Aggregated cumulative percent new bear damage pre and post treatment") +
  scale_y_continuous(limits = c(0, .21))

dam_bar1(bd1, bear_cum, type = "pct",
  title = "Average cumulative percent new bear damage pre and post treatment") +
  scale_y_continuous(limits = c(0, .21))
```

We can put error bars on the top of the bar indicating the variation in total damage across plots, but in this case we would need to include spruce (or totally omit it from the dataset), otherwise the error bar be centered in empty space.

```{r 09-damage-15, animation.hook="gifski"}

dam_bar1(bd1, bear_cum, type = "pct", error = TRUE,
  title = "Average cumulative percent new bear damage pre and post treatment") +
  scale_y_continuous(limits = c(0, .24))

dam_bar1(bd1, bear_cum, type = "pct", error = TRUE, nospruce = TRUE,
  title = "Average cumulative percent new bear damage pre and post treatment") +
  scale_fill_manual(values = palette()[2:3]) +
  scale_y_continuous(limits = c(0, .24))

```

There is a difference between whether we consider only live trees at any given time, or all trees, live or dead that were hit by bears. I think it is best to consider all post treatment trees, whether they died at some point or not. This is different from my earlier thinking. Perhaps the last panel of the publication bear chart will show existing conditions with live trees only...

```{r 09-damage-16, animation.hook="gifski"}

bd_live <- filter(bd1, live)

dam_bar1(bd_live, bear_cum, type = "pct",
  title = "Live only avg cumulative percent new bear damage pre and post treatment") +
  scale_y_continuous(limits = c(0, .22))

dam_bar1(bd1, bear_cum, type = "pct",
  title = "all trees avg cumulative percent new bear damage pre and post treatment") +
  scale_y_continuous(limits = c(0, .22))

```

Finally, If we are trying to characterize the outcome considering all post-treatment trees, then I have to deal with tags that are dropped from observation list because of being dead and down. There are 38 of these trees, some of them bear damaged. I created observations for these "lost" trees to make a complete picture of *all posttreatment trees*

```{r 09-damage-17, animation.hook="gifski"}

dam_bar1(bd1, bear_cum, type = "pct",
  title = "Incomplete cumulative percent new bear damage all trees") +
  scale_y_continuous(limits = c(0, .21))

dam_bar1(bd_all, bear_cum, type = "pct",
  title = "Complete cumulative percent new bear damage all trees") +
  scale_y_continuous(limits = c(0, .21))

```

Another way of removing spruce would be to omit it from the dataset premtively, then instead of proportion of damage to all trees, it would proportion of damage to trees other than spruce, this doesn't really make sense, because we are also not really interested in alder and hemlock, so if we remove spruce, we should probably also remove these species. Then our figure becomes proportion of post-treatment RW and DF, which I feel like is consistent with other sections of the analysis. Here is what that looks like.

```{r gif-rw-df-only, animation.hook="gifski"}

dam_bar1(bd_all, bear_cum, type = "pct",
  title = "All trees damage") +
  scale_y_continuous(limits = c(0, .21)) +
  scale_fill_manual(values = palette()[c(3, 1, 2)])

dam_bar1(filter(bd_all, spp2 %in% c("PSMEM", "SESE3")), bear_cum, type = "pct",
  title = "DF and RW only damage") +
  scale_y_continuous(limits = c(0, .21))

```

## Damage bar conclusion

I will include all RW and DF trees that existed post treatment. Cumulative counts and proportions will represent both standing and downed trees of these two species. Spruce, hemlock, and alder will be excluded entirely.

```{r}

bd <- filter(bd_all, spp2 %in% c("PSMEM", "SESE3"))

```

## Summary

Here is percent new bear damage over time. The H40 and L40 treatments seem to have the largest increases.

In the H40 treatment there is a decline in percent new bear damage in two plots, whereas for the L40 treatment, only one declines, and it is anomalous in that is the only plot that sees an increase from 2008 to 2013.

Error bars represent 1 SE of mean

```{r 09-damage-18}
bear_plot(bd, bear_new, "pct") +
  labs(title = "Average percent new bear damage for each treatment ±SE")

bear_plot(bd, bear_new, "cnt") +
  labs(title = "Average count new bear damage for each treatment ±SE")
```

Here is the same data for individual plots, showing their trajectories

```{r 09-damage-19}
bd_plot_fig(bd, bear_new)

bd_plot_fig(bd, bear_new, prop = TRUE)
```

## Bear bar figure for publication

```{r 09-damage-20, fig.width=7.16, fig.height=3.54}

year_labels <- c(
  init = "Pretreatment damage\nbefore thin",
  `08` = "Pretreatment damage\nafter thin",
  `13` = "Posttreatment damage\n2008 - 2013",
  `18` = "Posttreatment damage\n2008 - 2018",
  final = "Observed damage\n2018 (live only)"
)

type_labels <- as_labeller(c(
  bear_cnt = "Stems ~ ha^-1",
  bear_prop = "Proportion"
), label_parsed)


# bd_2 <- group_by(bd, treatment, year, spp2) |> filter(spp %in% c("SESE3", "PSMEM"))

bd_18 <- filter(bd, year == "18", live) |> mutate(year = "final")

bd_18 <- bind_rows(
  bear_cnt = spp_bar(bd_18, bear, "cnt"),
  bear_prop = spp_bar(bd_18, bear, "pct"),
  .id = "sum_type"
)

bd_p <- bind_rows(
  bear_cnt = spp_bar(bd, bear_cum, "cnt"),
  bear_prop = spp_bar(bd, bear_cum, "pct"),
  .id = "sum_type"
)

bd_pub <- bind_rows(bd_p, bd_18) |> 
  mutate(year = factor(year, levels = c("init", "08", "13", "18", "final")))

bd_pub %>%
  ggplot(aes(treatment, avg_spp, fill = spp2)) +
    geom_col(position = "stack") +
    facet_grid(
      sum_type ~ year,
      scales = "free_y",
      switch = "y",
      labeller = labeller(sum_type = type_labels, year = year_labels)
    ) +
    theme_bw() +
    theme(
      plot.margin = unit(c(1, 1.2, 1, 0), "mm"),
      panel.spacing = unit(1.5, "mm"),
      axis.title.y = element_blank(),
      strip.placement = "outside",
      strip.background.y = element_blank(),
      strip.switch.pad.grid = unit(0, "mm"),
      legend.position = c(.91, .9),
      legend.title = element_blank(),
      legend.background = element_blank(),
      axis.text.x = element_text(vjust = 0, angle = -45),
      axis.title = element_text(size = 11),
      strip.text.y = element_text(size = 11)
    ) +
    scale_fill_manual(
      values = c("#969696", "black"),
      labels = c("Douglas-fir", "Redwood")
    ) + 
    scale_y_continuous(expand = c(0.0, 0, 0.08, 0)) +
    # scale_x_discrete(expand = expansion(mult = 0.05)) +
    labs(x = "Treatment")

ggsave(
  filename = "figs/bd_summary.pdf",
  device = cairo_pdf,
  width = 18.2,
  height = 9,
  units = "cm"
)

ggsave(
  filename = "figs/bd_summary.jpg",
  width = 18.2,
  height = 9,
  units = "cm"
)

```

## Modeling

There are several potential to modeling this data.

1. Probability of bear damage could be modeled as binary data with a generalized linear model binomial regression with logit link (logistic regression). This, I think would be answering: for a random (average?) tree from a given treatment, what is the probability that it would be bear damaged? I'm not sure we have sufficient observations of damaged trees characterize the distribution. In the case of prediction, we may need to make adjustment for the [imbalance of response](https://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression)

2. Another approach is modeling *percent bear damage* at the plot level. Here linear regression may work, but theoretically, our response is bound by (0, 1). One recommendation here is [Beta regression](https://hansjoerg.me/2019/05/10/regression-modeling-with-proportion-data-part-1/).

3. Also at the plot level, we could model counts using Poisson or negative binomial GLMM. This would answer the question: how many trees can we expect to be bear damaged given a treatment. Here it would probably be important to account for differences between treatments like diameter increment and tree size.

## Logistic regression

We are modeling occurrence of new bear damage in 2013 and 2018. New bear damage in 2008 is not really comparable as it represents accumulated damage over an unspecified amount of time prior to treatment.

My first model is additive and includes `treatment`, `year`, `d_inc2`, `spp2` and random slopes for `plot`, *and* `tree_id`.

The random effects for `tree_id` are very small, these might end up getting dropped.

*I'm removing dead trees for modeling, because we want to better estiamte probability of a live tree being damaged*

```{r bd-mod1, cache=TRUE, warning=FALSE}

bdmd <- bd |>
  group_by(treatment, plot) |>
  mutate(pbear_init = sum(year == "init" & bear)) |> 
  ungroup() |>
  filter(year %in% c("13", "18"), live, spp %in% c("PSMEM", "SESE3")) |>
  droplevels()


bm1 <- glmer(
  bear_new ~ treatment + year + d_inc2 + spp2 + (1 | plot) + (1 | tree_id),
  family = binomial,
  data = bdmd
  )

summary(bm1)

```

check `allFit` output to see if this model can be considered reliable: The estimated coefficients are very congruent, I think we can trust this model.

```{r bd-mod2, cache=TRUE}

bm1.all <- allFit(bm1)

summary(bm1.all)$fixef

```

This model seems to be working. Now I'll do more model selection, testing for interactions. All of these models will include `plot` as the only random effect. I compare performance metrics using the R package: performance [@ludeckePerformancePackageAssessment2021]


```{r 09-damage-21}

fl <- list(
    bear_new ~ treatment + year + spp2 + d_inc2
  , bear_new ~ treatment + year + spp2 + scale(ba_inc2)
  , bear_new ~ treatment + year + spp2 + d_inc2 + year:spp2
  , bear_new ~ treatment + year + spp2 + scale(ba_inc2) + year:spp2
  , bear_new ~ treatment + year + spp2 + scale(ba_inc2) + year:spp2 + treatment:year
  , bear_new ~ treatment + year + spp2 + d_inc2 + treatment:spp2
  , bear_new ~ treatment + year + spp2 + scale(ba_inc2) + treatment:spp2
  , bear_new ~ treatment + year + spp2 + d_inc2 + treatment:year
  , bear_new ~ treatment + year + spp2 + scale(ba_inc2) + treatment:year
  , bear_new ~ treatment + year + spp2 + d_inc2 + spp2:d_inc2
  , bear_new ~ treatment + year + spp2 + scale(ba_inc2) + spp2:scale(ba_inc2)
  , bear_new ~ treatment + year + spp2 + scale(ba_inc2) + year:spp2 + spp2:scale(ba_inc2)
  , bear_new ~ treatment + year + spp2 + d_inc2 + year:spp2 + d_inc2:spp2
)

names(fl) <- seq_along(fl)

make_glmm_mods <- function(dat, fl, w_tree = FALSE){
  ran <- "~ . + (1 | plot)"
  if (w_tree) ran <- paste(ran, "+ (1 | tree_id)")
  eval(bquote(
    lapply(fl, \(x) {
      form <- update(x, ran)
      print(paste("Evaluating: ", deparse1(x)))
      glmer(
        form,
        family = binomial(),
        data = .(substitute(dat)),
        control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun=2e7))
      )
    })
  ))
}

make_bglmm_mods <- function(dat, fl, w_tree = FALSE){
  ran <- "~ . + (1 | plot)"
  if (w_tree) ran <- paste(ran, "+ (1 | tree_id)")
  out <- eval(bquote(
    lapply(fl, function(x) {
      mod0 <- glm(x, data = dat)
      print(paste("Evaluating: ", deparse1(x)))
      n_coef <<- length(coef(mod0))
      form <- update(x, ran)
      blme::bglmer(
        form,
        family = binomial(),
        data = .(substitute(dat)),
        fixef.prior = normal(cov = diag(9, n_coef)),
        control = glmerControl(optimizer="bobyqa", optCtrl = list(maxfun=2e7))
      )
    })
  ))
  rm(n_coef, pos = .GlobalEnv)
  return(out)
}

```

Here is the list of models that I am going to test. They include different interaction terms. In addition, I can include `tree_id` as a random effect or not. Preliminary trials, showed that models were unstable when including this term. Within subject (`tree_id`) variance is perfectly correlated with outcome because most trees that are damaged are only damaged once. This seems to show up in the model as with `tree_id` absorbing most of the variance.

There are also problems with complete separation, which I attempt to fix by using a Bayesian framework. I tried using penalized regression with a fixed effects only glm (bias reduction). I also tried a Bayesian model and specified a [prior variance for the fixed effects as a gaussian distribution with a sd of 3](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#penalizationhandling-complete-separation). Ben Bolker [says](https://bbolker.github.io/mixedmodels-misc/ecostats_chap.html#digression-complete-separation) "We can use `bglmer` from the `blme` package to impose zero-mean Normal priors on the fixed effects" 

```{r 09-damage-22}
# model list
data.frame(form = sapply(fl, deparse1)) %>% 
  kbl2(row.names = TRUE,
  caption = "Formulas testing different tree-growth variables and intereactions")
```

```{r bd-mod3, cache=TRUE, include=FALSE}

# run models
bdm1 <- make_glmm_mods(bdmd, fl)
# bdm1_tree <- make_glmm_mods(bdmd, fl, w_tree = TRUE)

bbdm <- make_bglmm_mods(bdmd, fl)
# bbdm_tree <- make_bglmm_mods(bdmd, fl, w_tree = TRUE)

```

When looking at the summaries of the models that include `tree_id`, they all fail to converge. I think there is a problem with the fact that within subject variance is perfectly correlated with our response, because trees can, in general only be damaged in one period or the other, trees that receive damage have very high variance, while those that do not have very low variance.

My best guess is that it doesn't make sense to include `tree_id` because of its bimodal error.

Another option could be ensuring that a tree is only damaged in one period, but here are only 3 trees that receive "new" damage in both periods. I doubt that these are having a huge effect. For the most part, trees are only damaged in one period or the other.

**For our purposes, we are using Bayesian GLMM model 4**

```{r damage23, cache=TRUE}

show_regression_table <- function(mod, cap = cap) {
  mod_list <- do.call(compare_performance, c(mod, metrics = "common")) %>%
    arrange(AIC)
  if (nrow(mod_list) > 4) {
    to_compare <- as.numeric(c(mod_list$Name[1:4], "2"))
  } else {
    to_compare <- 1:nrow(mod_list)
  }
  sjPlot::tab_model(
  mod[to_compare],
  show.ci = FALSE,
  show.aic = TRUE,
  dv.labels = paste("Formula", to_compare),
  title = cap
  )
}

sjPlot::tab_model(
  bbdm[4],
  show.se = TRUE,
  show.p = TRUE,
  show.ci = FALSE,
  dv.labels = "model 4",
  title = "Bayesian bear damage model #4"
  )

show_regression_table(bdm1, "GLMM models with random plot only")
# examine_bdmod(bdm1_tree, "GLMM models with random plot and tree_id")
show_regression_table(bbdm, "Bayesian GLMM models with random plot only")
# examine_bdmod(bbdm_tree, "Bayesian GLMM models with random plot and tree_id")

```


I need a function for comparing models visually, with their predictions. I think that emmeans does this, but because I can't seem to be sure what I am predicting with emmeans, I will construct my own dataset to make predictions in order to clarify what emmeans are predicting.

I'm predicting the confidence interval using fixed effects error only. This is how emmeans does it as well. I could try including random effects variance. I followed instructions found on Ben Bolkers' [FAQ](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#predictions-andor-confidence-or-prediction-intervals-on-predictions). A more accurate solution would be to produce [bootstrapped predictions](https://stats.stackexchange.com/a/147837)

while I'm on the topic, it is also possible to get bootstrapped CIs for [model coefficients](https://stackoverflow.com/questions/26417005/odds-ratio-and-confidence-intervals-from-glmer-output)

```{r 09-damage-23}

pred_bd <- function(mod, limit = FALSE) {
  data <- eval(summary(mod)$call[[3]])
  if (limit) {
    prediction_data <- data %>% 
    group_by(spp, treatment, year) %>%
    summarize(ba_inc2 = seq(
        0,
        quantile(ba_inc2, 0.99),
        by = 20
    )) %>%
    mutate(plot = NA, spp2 = spp)
  } else {
    ba_incs <- seq(
        0,
        quantile(bdmd$ba_inc2, 0.99),
        by = 20
    )
    prediction_data <- expand.grid(
      plot = NA,
      spp2 = c("PSMEM", "SESE3"),
      year = c("13", "18"),
      treatment = c("C", "H40", "H80", "L40", "L80"),
      ba_inc2 = ba_incs
    )
  }
  if (inherits(mod, "merMod")) {
    prediction_data <- cbind(
      prediction_data,
      bear_new = predict(mod, newdata = prediction_data, re.form = NA)
    )
  } else if (inherits(mod, "glm")) {
    prediction_data <- cbind(
      prediction_data,
      bear_new = predict(mod, newdata = prediction_data, re.form = NA)
    )
  }

  # This is the part where I get the std error as per GLMM FAQ instructions
  dat <- prediction_data
  dmat <- model.matrix(terms(mod), dat)
  pvar1 <- diag(dmat %*% tcrossprod(vcov(mod), dmat))
  cmult <- 1.96
  
  tibble(dat
  , plo = dat$bear_new - cmult * sqrt(pvar1)
  , phi = dat$bear_new + cmult * sqrt(pvar1)
  ) %>% 
  mutate(across( c(bear_new, plo, phi), ~ exp(.x) / (1 + exp(.x)) ))
}

```

One column bear damage figure

```{r 09-damage-24}

doj <- position_dodge(width = 20)

pred_bd(bbdm[[4]], limit = TRUE) %>%
  relevel_treatment() %>%
   mutate(
    year = recode(year, `13` = "2013", `18` = "2018"),
    spp2 = recode(spp, SESE3 = "Redwood", PSMEM = "Douglas-fir")
  ) %>%
  ggplot(aes(ba_inc2, bear_new)) +
    geom_ribbon(aes(ymin = plo, ymax = phi, fill = treatment), alpha = 0.4, color = NA, position = doj) +
    geom_line(aes(linetype = treatment), size = .6, position = doj) +
    geom_point(aes(shape = treatment), size = 1.3, position = doj) +
    # geom_linerange(aes(ymin = plo, ymax = phi, linetype = treatment), alpha = 0.5, size = 1, position = doj) +
    facet_grid(spp2 ~ year, scales = "free", space = "free") +
    # expand_limits(y = c(-0.01, 0.19)) +
    geom_blank(data = data.frame(ba_inc2 = 0, year = "2013", spp2 = "Douglas-fir", bear_new = c(-.03, .17))) +
    theme_bw() +
    theme(
      legend.position = c(.17, .6),
      legend.title = element_blank(),
      legend.background = element_blank(),
      legend.key.size = unit(5, "mm")
    ) +
    scale_fill_grey(aesthetics = c("color", "fill")) +
    scale_linetype_manual(
      values = c(
        C40 = "solid",
        L40 = "dashed",
        C80 = "solid",
        L80 = "dashed",
        Control = "dotted"
      )
    ) +
    scale_shape_manual(
      values = c(
        C40 = "circle",
        L40 = "triangle",
        C80 = "circle open",
        L80 = "triangle open",
        Control = "asterisk"
      )
    ) +
    scale_y_continuous(breaks = seq(0, .8, .1)) +
    labs(
      y = "Probability",
      x = expression(BAI ~ (cm^2 ~ year^-1)),
      fill = "Treatment",
      shape = "Treatment",
      linetype = "Treatment"
    )

ggsave(
  filename = "figs/bd_pred.pdf",
  device = cairo_pdf,
  width = 8.84,
  height = 11,
  units = "cm"
)

ggsave(
  filename = "figs/bd_pred.jpg",
  width = 8.84,
  height = 11,
  units = "cm"
)

```

2-column bear damage figure

```{r 09-damage-25}

doj <- position_dodge(width = 20)

pred_bd(bbdm[[4]], limit = TRUE) %>%
  relevel_treatment() %>%
   mutate(
    year = recode(year, `13` = "2013", `18` = "2018"),
    spp2 = recode(spp, SESE3 = "Redwood", PSMEM = "Douglas-fir")
  ) %>%
  ggplot(aes(ba_inc2, bear_new)) +
    geom_ribbon(aes(ymin = plo, ymax = phi, fill = treatment), alpha = 0.4, color = NA, position = doj) +
    geom_line(aes(linetype = treatment), size = .6, position = doj) +
    geom_point(aes(shape = treatment), size = 1.3, position = doj) +
    # geom_linerange(aes(ymin = plo, ymax = phi, linetype = treatment), alpha = 0.5, size = 1, position = doj) +
    facet_grid(spp2 ~ year, scales = "free", space = "free") +
    # expand_limits(y = c(-0.01, 0.19)) +
    geom_blank(data = data.frame(ba_inc2 = 0, year = "2013", spp2 = "Douglas-fir", bear_new = c(-.03, .17))) +
    theme_bw() +
    theme(
      legend.position = c(.1, .57),
      legend.title = element_blank(),
      legend.background = element_blank(),
      # legend.key.size = unit(5, "mm")
    ) +
    scale_fill_grey(aesthetics = c("color", "fill")) +
    scale_linetype_manual(
      values = c(
        C40 = "solid",
        L40 = "dashed",
        C80 = "solid",
        L80 = "dashed",
        Control = "dotted"
      )
    ) +
    scale_shape_manual(
      values = c(
        C40 = "circle",
        L40 = "triangle",
        C80 = "circle open",
        L80 = "triangle open",
        Control = "asterisk"
      )
    ) +
    scale_y_continuous(breaks = seq(0, .8, .1)) +
    labs(
      y = "Probability",
      x = expression(BAI ~ (cm^2 ~ year^-1)),
      fill = "Treatment",
      shape = "Treatment",
      linetype = "Treatment"
    )

ggsave(
  filename = "figs/bd_pred2.pdf",
  device = cairo_pdf,
  width = 18.2,
  height = 12,
  units = "cm"
)

ggsave(
  filename = "figs/bd_pred2.jpg",
  width = 18.2,
  height = 12,
  units = "cm"
)

```

Here I am able to show that the predictions from emmeans are exactly the same as those I am getting from predicting using my own function. 

```{r 09-damage-26}

ba_incs <- list(
  ba_inc2 = seq(
    quantile(bdmd$ba_inc2, 0.01),
    quantile(bdmd$ba_inc2, 0.99),
    length.out = 25
  )
)

emmip(bbdm[[4]],
  treatment ~ ba_inc2 | spp2 + year,
  at = ba_incs,
  type = "response",
  CIs = TRUE
) 


```

## Emmeans: interpreting model

Redwood is clearly more targeted.

```{r 09-damage-27}

emmeans(bbdm[[4]], pairwise ~ treatment, type = "response")

```

Here we can see explicit comparisons between treatments

```{r 09-damage-28}

mean_treat <- emmeans(bbdm[[4]], ~ treatment, type = "response")

pairs(mean_treat)

pwpp(mean_treat)

mean_treat %>% multcomp::cld(reversed = TRUE)

```

Finally, here is the expected response over levels of diameter increment.

```{r 09-damage-29}

emmip(bbdm[[4]], treatment ~ ba_inc2 | spp2 + year, at = ba_incs, CIs = TRUE, type = "response") + labs(title = "bayesian new bear damage model")


```

## Compare model predictions to observed

First I show amount of bear damage predicted (sum of predicted probabilities) for each time period compared to observed.

```{r 09-damage-30}
##### Predict new bear damage #######

make_pred_bar_chart <- function() {
  
  pred <- predict(bbdm[[4]], type = "response")
  
  data1 <- cbind(bdmd, pred = pred)

  a <- data1 %>%
    group_by(treatment, year, spp2) %>%
    summarize(pred = sum(pred)) %>%
    ungroup() %>%
    ggplot(aes(treatment, pred, fill = spp2)) + 
      geom_col(position = "stack", color = "black") +
      facet_wrap(~ year) +
      scale_fill_manual(values = palette()) +
      labs(
        title = "predicted sum of DF and RW trees with bear damage",
        y = "count",
        x = "Treatment",
        fill = NULL
      ) +
      theme(legend.position = "none") +
      scale_y_continuous(limits = c(0, 17))

  b <- bd %>%
    filter(year %in% c("13", "18")) %>%
    group_by(treatment, year, spp2) %>%
    summarize(bear_new = sum(bear_new)) %>%
    ungroup() %>%
    ggplot(aes(treatment, bear_new, fill = spp2)) + 
      geom_col(position = "stack", color = "black") +
      facet_wrap(~ year) +
      scale_fill_manual(values = palette()) +
      labs(
        title = "Total sum of new bear damage for each treatment/species",
        y = "Count of bear damage",
        x = "Treatment",
        fill = NULL
      ) +
      scale_y_continuous(limits = c(0, 17))

  a + b
}

make_pred_bar_chart()
```

This figure compares the cumulative (post treatment) predicted amount of bear damage to the cumulative observed.

```{r 09-damage-31}

####### Cumulative prediction from new bear damage model #########

make_pred_bar_chart2 <- function() {

  data1 <- cbind(bdmd, pred1 = predict(bbdm[[4]], type = "response")) %>%
    group_by(tree_id) %>%
    mutate(pred = if_else(year == "18", sum(pred1), pred1))


  p1 <- data1 %>%
    group_by(treatment, year, spp2) %>%
    summarize(pred = sum(pred)) %>%
    ungroup() %>%
    ggplot(aes(treatment, pred, fill = spp2)) + 
      geom_col(position = "stack", color = "black") +
      facet_wrap(~ year, ncol = 6) +
      scale_fill_manual(values = palette()) +
      labs(
        title = "predicted sum of DF and RW trees with bear damage",
        y = "count",
        x = "Treatment",
        fill = NULL
      ) +
      theme(legend.position = "none") +
      scale_y_continuous(limits = c(0, 25))

  p2 <- bdmd %>%
    group_by(treatment, year, spp2) %>%
    summarize(bear_cum = sum(bear_cum)) %>%
    ungroup() %>%
    ggplot(aes(treatment, bear_cum, fill = spp2)) + 
      geom_col(position = "stack", color = "black") +
      facet_wrap(~ year) +
      scale_fill_manual(values = palette()) +
      labs(
        title = "cumulative (after treatment) new bear damage for each treatment/species",
        y = "Count of bear damage",
        x = "Treatment",
        fill = NULL
      ) +
      scale_y_continuous(limits = c(0, 25))

  p1 + p2
}

make_pred_bar_chart2()
```


## Model validation

I can check for over dispersion, and [I don't think I need to worry about underdispersion](https://stats.stackexchange.com/questions/568407/how-to-correct-underdispersion-in-logistic-regression)

```{r 09-damage-32}


```

# Wind damage and breakage

Wind damage and breakage are characterized by condition codes:

    2          = dead top
    3          = broken top
    12         = broken top - operational damage
    30, 31     = broken stem (below and above dbh)
    32         = windthrow (uprooted)

there is only 1 code 12, I think we can ignore it

```{r damage33, cache=TRUE}


wd <- mutate(dam_all, spp2 = spp) |> filter(spp2 %in% c("PSMEM", "SESE3"))

wd <- wd %>%
  group_by(tree_id) %>%
  mutate(
    bt_new = get_cond(3) & year %in% c("init", "08") | get_cond(3) & !lag(order_by = year, get_cond(3)),
    bt_new = if_else(is.na(bt_new), FALSE, bt_new),
    bt_cum = if_else(year == "18" & lag(order_by = year, bt_new), TRUE, bt_new),
    
    bs_new = get_cond(30, 31) & year %in% c("init", "08") | get_cond(30, 31) & !lag(order_by = year, get_cond(30, 31)),
    bs_new = if_else(is.na(bs_new), FALSE, bs_new),
    bs_cum = if_else(year == "18" & lag(order_by = year, bs_new), TRUE, bs_new),
    
    wt_new = get_cond(32) & year %in% c("init", "08") | get_cond(32) & !lag(order_by = year, get_cond(32)),
    wt_new = if_else(is.na(wt_new), FALSE, wt_new),
    wt_cum = if_else(year == "18" & lag(order_by = year, wt_new), TRUE, wt_new)
    ) %>%
  ungroup()

```

Some damaged trees are listed as damaged and subsequently unlisted.

for code 3, there are 5 dropped trees most of them due to death
for codes 30 and 31 there are 6. Most of these die, others are recorded differently, ie 31 to 3
for code 32, there are 2, both are dead at the start

```{r damage34, cache=TRUE, incldue=FALSE}

get_dropped <- function(dat, test_condition) {
  dat %>%
    group_by(tree_id) %>%
    mutate( dropped = lag({{test_condition}}) & !{{test_condition}} ) %>%
    filter(any(dropped)) %>%
    mutate(id = cur_group_id()) %>%
    relocate(id) %>%
    arrange(id)
}


get_dropped(wd, get_cond(2)) 
get_dropped(wd, get_cond(3)) 
get_dropped(wd, get_cond(30, 31)) 
get_dropped(wd, get_cond(32))


```

Basically I want to know how many trees ended up being damaged over the 10 years since thinning. It might also be useful to compare this to initial damage rates.

```{r 09-damage-33}

bear_plot(wd, bt_new, "pct") +
  labs(title = "Average percent new broken tops for each treatment ±SE")

bear_plot(wd, bs_new, "pct") +
  labs(title = "Average percent new broken stems for each treatment ±SE")

bear_plot(wd, wt_new, "pct") +
  labs(title = "Average percent new windthrow for each treatment ±SE")

```

```{r 09-damage-34}

dam_bar1(wd, bt_cum, type = "pct", "Cumulative percent of all trees with broken tops")
dam_bar1(wd, bs_cum, type = "pct", "Cumulative percent of all trees with broken stems", error = TRUE)
dam_bar1(wd, wt_cum, type = "pct", "Cumulative percent of all uprooted trees")

```

# Wind damage figure *unfinished*

```{r 09-damage-35}

myvars <- list("Broken~stem" = "bs_cum", "Broken~top" = "bt_cum", "Windthrow" = "wt_cum")
myvars <- lapply(myvars, as.name)


# First get metric (count/proportion) and variable (broken stem... etc) for each
# species/plot/year, expand counts to per ha (4 * .08 ha plots)
dam_data <- bind_rows (
  "Proportion" = map_dfr(myvars, ~ dam_plot(wd, eval(.x), "pct"), .id = "measure"),
  "Stems~ha^-1" = map_dfr(myvars, ~ dam_plot(wd, eval(.x), "cnt"), .id = "measure"),
  .id = "type"
) |> 
filter(year == "18") |>
mutate(var_true = if_else(type == "Stems~ha^-1", var_true * 3.125, var_true)) |>
relevel_treatment()

# find treatment average for each species for stacked bar
dam_data_spp <- dam_data |>
  group_by(type, measure, treatment, spp2) |>
  summarise(spp_avg = sum(var_true) / 4) |>
  filter(spp_avg > 0)

# and treatment average and SE for all species for total height of bar,
# for error bars.
dam_data_treat <- dam_data |>
  group_by(type, measure, treatment, plot) |>
  summarize(var1 = sum(var_true)) |>
  summarise(avg = mean(var1), SE = sd(var1) / sqrt(n()))
```

```{r wind-fig}
error <- TRUE

ggplot(mapping = aes(x = treatment)) +
  geom_col(
    data = dam_data_spp,
    aes(y = spp_avg, fill = spp2),
    position = "stack",
    color = "black",
    width = 0.8
  ) +
  {if (error) {
    geom_errorbar(
      data = dam_data_treat,
      aes(y = avg, ymin = avg - SE, ymax = avg + SE),
      width = 0.2,
      color = "gray30"
    )
  }} +
  facet_grid(type ~ measure, scales = "free_y", labeller = label_parsed, switch = "y") +
  theme_bw() +
  theme(
    plot.margin = unit(c(1, 2.4, 1, 0), "mm"),
    panel.spacing = unit(1.5, "mm"),
    axis.title.y = element_blank(),
    strip.placement = "outside",
    strip.background.y = element_blank(),
    strip.switch.pad.grid = unit(0, "mm"),
    legend.position = c(.84, .37),
    legend.title = element_blank(),
    legend.key.size = unit(4, "mm"),
    legend.background = element_blank(),
    axis.text.x = element_text(vjust = 0, angle = -45),
    axis.title = element_text(size = 10),
    strip.text.y = element_text(size = 10),
  ) +
  scale_fill_manual(
    values = c("#969696", "black"),
    labels = c("Douglas-fir", "Redwood")
  ) + 
  scale_y_continuous(
    expand = c(0.0, 0, 0.08, 0), 
    breaks = scales::breaks_extended(w = c(4, 1, 20, 1), only.loose = TRUE)
  ) +
  labs(x = "Treatment")
  
ggsave(
  filename = "figs/wind_damage.pdf",
  device = cairo_pdf,
  width = fig_w(1, "cm"),
  height = 9.5,
  units = "cm"
)

ggsave(
  filename = "figs/wind_damage.jpg",
  width = fig_w(1, "cm"),
  height = 9.5,
  units = "cm"
)

```