2-10-Function_factories.Rmd

```{r, include=FALSE}
source("common.R")
```

# Function factories

## Prerequisites

   For most of this chapter base R is sufficient. Just a few exercises require **rlang** and **ggplot2** packages.

    ```{r}
    library(rlang)
    library(ggplot2)
    ```

## Factory fundamentals

1. __[Q]{.Q}__: The definition of `force()` is simple:

    ```{r}
    force
    ```
    
   Why is it better to `force(x)` instead of just `x`?
    
   __[A]{.solved}__: As you can see `force(x)` is just syntactic sugar for `x`. We prefer this explicit form, because
    
   > using this function clearly indicates that you’re forcing evaluation, not that you’ve accidentally typed `x`. (Quote from the textbook)

2. __[Q]{.Q}__: Base R contains two function factories, `approxfun()` and `ecdf()`. Read their documentation and experiment to figure out what the functions do and what they return.

   __[A]{.solved}__: Let's begin with `approxfun()` as it is used within `ecdf()` also:
   
   - `approxfun()` takes a 2-dimensional combination of data points (`x` and `y`) as input and returns a *stepwise interpolation function*, which transforms new `x` values. Additional arguments control how the created function should behave. (The interpolation `method` may be linear or constant. `yleft`, `yright` and `rule` specify how the newly created function should map new values which are outside of `range(x)`. `f` controls the degree of right-left-continuity via a numeric value from `0` to `1` and `ties` expects a function name like min, mean etc. which defines how non-unique x-y-combinations should be handled when interpolating the data points.)

   - `ecdf()` is an acronym for empirical cumulative distribution function. For a numeric vector, `ecdf()` returns the appropriate distribution function (of class “ecdf”, which is inheriting from class “stepfun”). Initially the (x, y) pairs for the nodes of the density function are calculated. Afterwards these pairs are passed to `approxfun()`, which then returns the desired function.
   
  <!-- HW: I think a couple of examples here would help -->

3. __[Q]{.Q}__: Create a function `pick()` that takes an index, `i`, as an argument and returns a function with an argument `x` that subsets `x` with `i`.

    ```{r, eval = FALSE}
    pick(1)(x)
    # should be equivalent to
    x[[1]]
    
    lapply(mtcars, pick(5))
    # should be equivalent to
    lapply(mtcars, function(x) x[[5]])
    ```

   __[A]{.solved}__: In this exercise `pick(i)` acts as a function factory, which returns the required subsetting function.
   
    ```{r}
    pick <- function(i) {
      force(i)
      
      function(x) x[[i]]
    }
    
    x <- 1:3
    identical(x[[1]], pick(1)(x))
    identical(lapply(mtcars, function(x) x[[5]]),
              lapply(mtcars, pick(5)))
    ```

4. __[Q]{.Q}__: Create a function that creates functions that compute the i^th^ [central moment](http://en.wikipedia.org/wiki/Central_moment) of a numeric vector. You can test it by running the following code:

    ```{r, eval = FALSE}
    m1 <- moment(1)
    m2 <- moment(2)

    x <- runif(100)
    stopifnot(all.equal(m1(x), 0))
    stopifnot(all.equal(m2(x), var(x) * 99 / 100))
    ```

   __[A]{.solved}__: The first moment is closely related to the mean and describes the average deviation from the mean, which is 0 (within numerical margin of error). The second moment describes the variance of the input data. If we want compare it to `var`, we need to undo [Bessel's correction}(https://en.wikipedia.org/wiki/Bessel%27s_correction) correction by multiplying with $\frac{N-1}{N}$.
    
    ```{r}
    moment <- function(i){
      force(i)
    
      function(x) sum((x - mean(x)) ^ i) / length(x)
    }
    
    m1 <- moment(1)
    m2 <- moment(2)

    x <- runif(100)
    all.equal(m1(x), 0)  # removed stopifnot() for clarity
    all.equal(m2(x), var(x) * 99 / 100)
    ```

5. __[Q]{.Q}__: What happens if you don't use a closure? Make predictions, then verify with the code below.

    ```{r}
    i <- 0
    new_counter2 <- function() {
      i <<- i + 1
      i
    }
    ```

   __[A]{.solved}__: Without the captured and encapsulated environment of a closure the counts will be stored in the global environment. Here they can be overwritten or deleted as well as interfere with other counters.
   
    ```{r, error = TRUE}
    new_counter2()
    i
    new_counter2()
    i
    
    i <- 0
    new_counter2()
    i
    ```

6. __[Q]{.Q}__: What happens if you use `<-` instead of `<<-`? Make predictions, then verify with the code below.

    ```{r}
    new_counter3 <- function() {
      i <- 0
      function() {
        i <- i + 1
        i
      }
    }
    ```

   __[A]{.solved}__: Without the super assignment `<<-`, the counter will always return 1. The counter always starts in a new execution environment within the same enclosing environment, which contains an unchanged value for `i` (in this case it remains 0).
   
    ```{r}
    new_counter_3 <- new_counter3()
    
    new_counter_3()
    new_counter_3()
    ```
   

## Graphical factories

1. __[Q]{.Q}__: Compare and contrast `ggplot2::label_bquote()` with `scales::number_format()`.

   __[A]{.solved}__:
   
   <!-- HW: not sure what I was thinking here, but I'd focus more on the results than the implementation. i.e. they're both function factories designed to create nice labels; `label_bquote()` is used with facets, and adds mathematical notation; `number_format()` is used with axes/legends, and adds commas etc -->

## Statistical factories

1. __[Q]{.Q}__: In `boot_model()`, why don't I need to force the evaluation of `df` or `model`?

   __[A]{.solved}__: `boot_model()` ultimately returns a function, and whenever you return a function you need to make sure all the inputs are explicitly evaluated. Here that happens automatically because we use `df` and `formula` in `lm()`.
   
    ```{r}
    boot_model <- function(df, formula) {
      mod <- lm(formula, data = df)
      fitted <- unname(fitted(mod))
      resid <- unname(resid(mod))
      rm(mod)
      
      function() {
        fitted + sample(resid)
      }
    } 
    ```

2. __[Q]{.Q}__: Why might you formulate the Box-Cox transformation like this?

    ```{r}
    boxcox3 <- function(x) {
      function(lambda) {
        if (lambda == 0) {
          log(x)
        } else {
          (x ^ lambda - 1) / lambda
        }
      }  
    }
    ```

   __[A]{.started}__: `boxcox3` returns a function where `x` is fixed (though it is not forced, so it may manipulated later). This allows us to apply and test different transformations for different inputs and give them a descriptive name.

    ```{r, out.width = "45%", fig.show = "hold", fig.align = "none"}
    # initial example (should be improved)
    boxcox_airpassengers <- boxcox3(AirPassengers)
    
    plot(boxcox_airpassengers(0))
    plot(boxcox_airpassengers(1))
    plot(boxcox_airpassengers(2))
    ```

3. __[Q]{.Q}__: Why don't you need to worry that `boot_permute()` stores a copy of the data inside the function that it generates?

   __[A]{.solved}__: Because it doesn't actually store a copy; it's just a name that points to the same underlying object in memory.
   
    ```{r}
    boot_permute <- function(df, var) {
      n <- nrow(df)
      force(var)
      
      function() {
        col <- df[[var]]
        col[sample(n, replace = TRUE)]
      }
    }
    boot_mtcars1 <- boot_permute(mtcars, "mpg")
    
    lobstr::obj_size(mtcars)
    lobstr::obj_size(boot_mtcars1)
    lobstr::obj_sizes(mtcars, boot_mtcars1)
    ```
   
4. __[Q]{.Q}__: How much time does `ll_poisson2()` save compared to `ll_poisson1()`? Use `bench::mark()` to see how much faster the optimisation occurs. How does changing the length of `x` change the results?

   __[A]{.solved}__: Let us recall the definitions of `ll_poisson1()` and `ll_poisson2()` and the test data `x1`:
   
    ```{r}
    ll_poisson1 <- function(x) {
      n <- length(x)
    
      function(lambda) {
        log(lambda) * sum(x) - n * lambda - sum(lfactorial(x))
      }
    }
    
    ll_poisson2 <- function(x) {
      n <- length(x)
      sum_x <- sum(x)
      c <- sum(lfactorial(x))
      
      function(lambda) {
        log(lambda) * sum_x - n * lambda - c
      }
    }
    
    # provided test data
    x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38)
    ```
    
   A benchmark with this data reveals a performance improvement of factor 2 for `ll_poisson2()` over `ll_poisson1()`
   
    ```{r}    
    bench::mark(
      llp1 = optimise(ll_poisson1(x1), c(0, 100), maximum = TRUE),
      llp2 = optimise(ll_poisson2(x1), c(0, 100), maximum = TRUE)
    )
    ```
    
   Regarding differing lengths of `x1`, we expect even further performance improvements of `ll_poisson2()` compared to `ll_poisson1()`, as the redundant calculations within `ll_poisson1()`, become more expensive with growing length of `x1`. The following results imply that for a length of `x1` of 100000, `ll_poisson2()` is about 20+ times as fast as `ll_poisson1()`:
   
    ```{r, message = FALSE, warning = FALSE}
    library(purrr)
    library(dplyr)
    
    bench_poisson <- function(i){
      x_i_length <- 10L ^ i
      x_i <- rpois(x_i_length, 100L)
      
      rel_advantage <- bench::mark(llp1 = optimise(ll_poisson1(x_i), c(0, 100), maximum = TRUE),
                  llp2 = optimise(ll_poisson2(x_i), c(0, 100), maximum = TRUE),
                  relative = TRUE)$median %>% 
        max()
      
      rel_advantage
    }
    
    bench_df <- map_dbl(1:5, bench_poisson) %>% 
      tibble(i = 1:5,
             rel_advantage = .,
             x_length = 10 ^ i)
    
    bench_df %>% 
      ggplot(aes(x_length, rel_advantage)) +
      geom_point() +
      geom_line() +
      ggtitle("Rel. Speed of ll_poisson2() increases with vector length") +
      scale_x_log10()
    ```


## Function factories + functionals
1. __[Q]{.Q}__: Which of the following commands is equivalent to `with(x, f(z))`?

   (a) `x$f(x$z)`.
   (b) `f(x$z)`.
   (c) `x$f(z)`.
   (d) `f(z)`.
   (e) It depends.

   __[A]{.solved}__: (e) "It depends" is the correct answer. Usually `with()` is used with a data frame, so you'd usually expect (b), but if `x` is a list, it could be any of the options.

    ```{r}   
    f <- mean
    z <- 1
    x <- list(f = mean, z = 1)

    identical(with(x, f(z)), x$f(x$z))
    identical(with(x, f(z)), f(x$z))
    identical(with(x, f(z)), x$f(z))
    identical(with(x, f(z)), f(z))
    ```

2. __[Q]{.Q}__: Compare and contrast the effects of `env_bind()` vs. `attach()` for the following code.
   
    ```{r}
    funs <- list(
      mean = function(x) mean(x, na.rm = TRUE),
      sum = function(x) sum(x, na.rm = TRUE)
    )
    
    attach(funs)
    mean <- function(x) stop("Hi!")
    detach(funs)
    
    env_bind(globalenv(), !!!funs)
    mean <- function(x) stop("Hi!") 
    env_unbind(globalenv(), names(funs))
    ```

   __[A]{.solved}__: `attach()` adds `funs` to the search path. Therefore, the provided functions are found before their respective versions from the base package. Further, they can not get accidently overwritten by similar named functions in the global environment. One annoying downsinde of using `attach()` is the possibility to attach the same object multiple times, making it necessary to call `detach()` equally often.
   
   In contrast `rlang::env_bind()` just adds the functions in `fun` to the global environment. No further side effects are introduced and the functions are overwritten when similarly named functions are defined.

## Old exercises

### Closures

1.  __[Q]{.Q}__: Why are functions created by other functions called closures?

   __[A]{.solved}__: As stated in the book:

    > because they enclose the environment of the parent function and can access all its variables.

2.  __[Q]{.Q}__: What does the following statistical function do? What would be a better
    name for it? (The existing name is a bit of a hint.)

    ```{r}
    bc <- function(lambda) {
      if (lambda == 0) {
        function(x) log(x)
      } else {
        function(x) (x ^ lambda - 1) / lambda
      }
    }
    ```  
    
    __[A]{.solved}__: It is the logarithm, when lambda equals zero and `x ^ lambda - 1 / lambda` otherwise. A better name might be `box_cox_transformation` (one parametric), you can read about it (here)[https://en.wikipedia.org/wiki/Power_transform].
    
3.  __[Q]{.Q}__: What does `approxfun()` do? What does it return?
__[A]{.solved}__: `approxfun` basically takes a combination of 2-dimensional data points + some extra specifications as arguments and returns a stepwise linear or constant interpolation function (defined on the range of given x-values, by default).

4.  __[Q]{.Q}__: What does `ecdf()` do? What does it return?
__[A]{.solved}__: "ecdf" means empirical density function. For a numeric vector, `ecdf()` returns the appropriate density function (of class "ecdf", which is inheriting from class "stepfun"). You can describe it's behaviour in 2 steps. In the first part of it's body, the `(x,y)` pairs for the nodes of the density function are calculated. In the second part these pairs are given to `approxfun`.

5.  __[Q]{.Q}__: Create a function that creates functions that compute the ith
    [central moment](http://en.wikipedia.org/wiki/Central_moment) of a numeric 
    vector. You can test it by running the following code:

    ```{r, eval = FALSE}
    m1 <- moment(1)
    m2 <- moment(2)

    x <- runif(100)
    stopifnot(all.equal(m1(x), 0))
    stopifnot(all.equal(m2(x), var(x) * 99 / 100))
    ```  
    
    __[A]{.solved}__: For a discrete formulation look [here](http://www.r-tutor.com/elementary-statistics/numerical-measures/moment)
    
    ```{r, eval = FALSE}
    moment <- function(i){
      function(x) sum((x - mean(x)) ^ i) / length(x)
    }
    ```

6.  __[Q]{.Q}__: Create a function `pick()` that takes an index, `i`, as an argument and
    returns a function with an argument `x` that subsets `x` with `i`.

    ```{r, eval = FALSE}
    lapply(mtcars, pick(5))
    # should do the same as this
    lapply(mtcars, function(x) x[[5]])
    ```  
    
    __[A]{.solved}__:
    
    ```{r, eval = FALSE}
    pick <- function(i){
      function(x) x[[i]]
      }
    
    stopifnot(identical(lapply(mtcars, pick(5)),
                        lapply(mtcars, function(x) x[[5]]))
              )
    ```    

### Case study: numerical integration

1.  __[Q]{.Q}__: Instead of creating individual functions (e.g., `midpoint()`,
      `trapezoid()`, `simpson()`, etc.), we could store them in a list. If we 
    did that, how would that change the code? Can you create the list of 
    functions from a list of coefficients for the Newton-Cotes formulae?  
    __[A]{.solved}__:

2.  __[Q]{.Q}__: The trade-off between integration rules is that more complex rules are
    slower to compute, but need fewer pieces. For `sin()` in the range 
    [0, $\pi$], determine the number of pieces needed so that each rule will 
    be equally accurate. Illustrate your results with a graph. How do they
    change for different functions? `sin(1 / x^2)` is particularly challenging.  
    __[A]{.solved}__: