forked from Tazinho/Advanced-R-Solutions
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path2-11-Function_operators.Rmd
executable file
·194 lines (132 loc) · 7.7 KB
/
2-11-Function_operators.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
```{r, include=FALSE}
source("common.R")
```
# Function operators
```{r setup}
library(purrr)
```
## Existing function operators
1. __[Q]{.Q}__: Base R provides a function operator in the form of `Vectorize()`. What does it do? When might you use it?
__[A]{.solved}__: In R a lot of functions are "vectorised". Vectorised has two meanings. First, it means (broadly) that a function inputs a vector or vectors, and does something to each element. Secondly, it usually implies that these operations are implemented in a compiled language such as C or Fortran, so that the implementation is very fast.
However, despite what the function's name implies, `Vectorize()` is not able to speed up the provided function. It rather changes the input format of the supplied arguments (`vectorize.args`), so that they can be iterated over.
In essence, `Vectorize()` is mostly a wrapper for `mapply()`. Let's take a look at an example from the documentation:
```{r}
vrep <- Vectorize(rep.int)
vrep
# Application
vrep(1:2, 3:4)
# Naming arguments still works
vrep(times = 1:2, x = 3:4)
```
`Vectorize()` provides a convenient and concise notation to iterate over multiple arguments, but has some major drawbacks that mean you generally shouldn't use it. See <https://www.jimhester.com/2018/04/12/vectorize/> for more details.
2. __[Q]{.Q}__: Read the source code for `possibly()`. How does it work?
__[A]{.solved}__: `possibly()` modifies functions to return a specified default value in case of an error (`otherwise`) and to suppress any error messages (`quiet = TRUE`).
While reading the source code, we notice that `possibly()` internally uses `purrr::as_mapper()`. This enables users to supply not only functions, but also formulas or atomics via the same syntax as known from other functions in the purrr package. Besides this, the new default value (`otherwise`) gets evaluated once to make it (almost) immutable from now on.
The main functionality of `possibly()` is provided by `base::tryCatch()`. In this part the supplied function (`.f`) gets wrapped and the error and interrupt handling are specified.
```{r}
possibly
```
3. __[Q]{.Q}__: Read the source code for `safely()`. How does it work?
__[A]{.solved}__: `safely()` modifies functions to return a list, containing the elements "result" and "error". It works in a similar fashion as `possibly()` and besides using `as_mapper()`, `safely()` also provides the `otherwise` and `quiet` argument. However, in order to provide the result and the error in a consistent way, the `tryCatch()` part of the implementation returns a list with the same structure in both cases. In the case of successful evaluation "error" equals to `NULL` and in case of an error "result" equals to `otherwise`, which is `NULL` by default.
As the `tryCatch()` part is hidden in the internal `purrr:::capture_output()` function, we provide it here in addition to `safely()`:
```{r}
safely
purrr:::capture_error
```
Take a look at the textbook or the documentation of `safely()` to see how you can take advantage of this behaviour, for example when fitting many models.
## Case study: Creating your own function operators
1. __[Q]{.Q}__: Weigh the pros and cons of `download.file %>% dot_every(10) %>% delay_by(0.1)` vs `download.file %>% delay_by(0.1) %>% dot_every(10)`.
__[A]{.solved}__:
<!-- HW: do you get the dot before or after the pause. It probably doesn't make such difference here -->
2. __[Q]{.Q}__: Should you memoise `file.download()`? Why/why not?
__[A]{.solved}__: Memoising `file.download()` will only work if the files are immutable; i.e. if the file at a given url is always same. There's no point memoising unless this is true. Even if this is true, however, memoise has to store the results in memory, and large files will potentially take up a lot of memory.
This implies that it's probably not beneficial to memoise `file.download()` in most cases. The only exception is if you are downloading small files many times, and the file at a given url is guaranteed not to change.
3. __[Q]{.Q}__: Create a function operator that reports whenever a file is created or deleted in the working directory, using `dir()` and `setdiff()`. What other global function effects might you want to track?
__[A]{.solved}__: We first start with a function that simply reports the difference between two vectors of files:
```{r}
dir_compare <- function(old, new) {
if (setequal(old, new)) {
return()
}
added <- setdiff(new, old)
removed <- setdiff(old, new)
changes <- c(
if (length(added) > 0) paste0(" * '", added, "' was added"),
if (length(removed ) > 0) paste0(" * '", removed , "' was removed")
)
message(paste(changes, collapse = "\n"))
}
dir_compare(c("x", "y"), c("x", "y"))
dir_compare(c("x", "y"), c("x", "a"))
```
Then we wrap it up in a function operator
```{r}
track_dir <- function(f) {
force(f)
function(...) {
dir_old <- dir()
on.exit(dir_compare(dir_old, dir()), add = TRUE)
f(...)
}
}
```
And try it out by creating wrappers around `file.create()` and `file.remove()`:
```{r}
file_create <- track_dir(file.create)
file_remove <- track_dir(file.remove)
file_create("delete_me")
file_remove("delete_me")
```
To create a more serious version of `track_dir()` one might provide optionality to set the `full.names` and `recursive` arguments of `dir()` to `TRUE`. This would enable to also track the creation/deletion of hidden files and files in folders contained in the working directory.
Other global effects that might be worth tracking include changes regarding:
* the search path and possibly introduced `conflicts()`
* `options()` and `par()` which modify global settings
* the path of the working directory
* environment variables
4. __[Q]{.Q}__: Write a function operator that logs a timestamp and message to a file every time a function is run.
__[A]{.solved}__:
```{r}
append_line <- function(path, ...) {
cat(..., "\n", sep = "", file = path, append = TRUE)
}
logger <- function(f, log_path) {
force(f)
force(log_path)
append_line(log_path, "created at: ", as.character(Sys.time()))
function(...) {
append_line(log_path, "called at: ", as.character(Sys.time()))
f(...)
}
}
```
```{r}
log_path <- tempfile()
mean2 <- logger(mean, log_path)
Sys.sleep(5)
mean2(1:4)
Sys.sleep(1)
mean2(1:4)
readLines(log_path)
```
5. __[Q]{.Q}__: Modify `delay_by()` so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called `g <- delay_by(1, f); g(); Sys.sleep(2); g()` there shouldn't be an extra delay.
__[A]{.solved}__: We can do this with three little tricks (and the help of 42):
```{r, eval = FALSE}
delay_atleast <- function(f, amount) {
force(f)
force(amount)
# Store the last time the function was run
last_time <- NULL
function(...) {
if (!is.null(last_runtime)) {
wait <- (last_time - Sys.time()) + amount
if (wait > 0) {
Sys.sleep(wait)
}
}
# Update the time after the function has finished
on.exit(last_time <<- Sys.time())
f(...)
}
}
```
<!-- HW: Sys.sleep() is right the solution here. But can you check that I got the algebra correct? -->