From 1d81c1a6fffb396c96a55383ab0e305d8fcd5a0d Mon Sep 17 00:00:00 2001 From: Salim B Date: Thu, 23 Jul 2020 10:58:19 +0000 Subject: [PATCH] Polish chapter Fix typos, add external link etc. --- args-hidden.Rmd | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/args-hidden.Rmd b/args-hidden.Rmd index cb4b45a..1c5f1aa 100644 --- a/args-hidden.Rmd +++ b/args-hidden.Rmd @@ -6,7 +6,7 @@ source("common.R") ## What's the problem? -Functions are easier to understand if the results depend only on the values of the inputs. If a function returns surprisingly different results with the same inputs, then we say it has __hidden arguments__. Hidden arguments make code harder to reason about, because to correctly predict the output you also need to know some other state. +Functions are easier to understand if the results depend only on the values of the inputs. If a function returns surprisingly different results with the same inputs, then we say it has __hidden arguments__. Hidden arguments make code harder to reason about, because to correctly predict the output you also need to know some other state(s). Related: @@ -15,16 +15,17 @@ Related: ## What are some examples? -One common source of hidden arguments is the use of global options. These can be useful to control display but, as discussed in Chapter \@ref(def-user)), should not affect computation: +One common source of hidden arguments is the use of global options. These can be useful to control display but, as discussed in Chapter \@ref(def-user), should not affect computation: * The result of `data.frame(x = "a")$x` depends on the value of the global - `stringsAsFactors` option: if it's `TRUE` (the default) you get a factor; - if it's false, you get a character vector. + `stringsAsFactors` option: if it's `TRUE` (the default), you get a factor; + if it's `FALSE`, you get a character vector. * `lm()`'s handling of missing values depends on the global option of `na.action`. The default is `na.omit` which drops the missing values prior to fitting the model (which is inconvenient because then the results - of `predict()` don't line up with the input data. `modelr::na.warn()` + of `predict()` don't line up with the input data. + [`modelr::na.warn()`](https://modelr.tidyverse.org/reference/na.warn.html) provides an approach more in line with other base behaviours: it drops missing values with a warning.) @@ -33,7 +34,7 @@ Another common source of hidden inputs is the system locale: * `strptime()` relies on the names of weekdays and months in the current locale. That means `strptime("1 Jan 2020", "%d %b %Y")` will work on computers with an English locale, and fail elsewhere. This is particularly - troublesome for Europeans who frequently have colleagues who speak a + troublesome for Europeans who frequently have colleagues speaking a different language. * `as.POSIXct()` depends on the current timezone. The following code returns @@ -43,7 +44,7 @@ Another common source of hidden inputs is the system locale: as.POSIXct("2020-01-01 09:00") ``` -* `toupper()` and `tolower()` depend on the current locale. It is faily +* `toupper()` and `tolower()` depend on the current locale. It is fairly uncommon for this to cause problems because most languages either use their own character set, or use the same rules for capitalisation as English. However, this behaviour did cause a bug in ggplot2 because @@ -63,7 +64,7 @@ Another common source of hidden inputs is the system locale: order defined by the current locale. `factor()` uses `order()`, so the results from factor depend implicitly on the current locale. (This is not an imaginary problem as this - [SO question](https://stackoverflow.com/questions/39339489)) attests). + [SO question](https://stackoverflow.com/questions/39339489) attests). Some functions depend on external settings, but not in a surprising way: @@ -77,17 +78,17 @@ Some functions depend on external settings, but not in a surprising way: * Random number generators like `runif()` peek at the value of the special global variable `.Random.seed`. This is a little surprising, but if they - didn't have some global state every call to `runif()` would return the + didn't have some global state, every call to `runif()` would return the same value. ## Why is it important? -Hidden arguments are bad because they make it much harder to predict the output of a fuction. The worst offender by far is the `stringsAsFactors` option which changes how a number of functions (including `data.frame()`, `as.data.frame()`, and `read.csv()`) treat character vectors. This exists mostly for historical reasons, as described in [*stringsAsFactors: An unauthorized biography*](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng and [*stringsAsFactors = \*](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) -by Thomas Lumley. ) +Hidden arguments are bad because they make it much harder to predict the output of a function. The worst offender by far is the `stringsAsFactors` option which changes how a number of functions (including `data.frame()`, `as.data.frame()`, and `read.csv()`) treat character vectors. This exists mostly for historical reasons, as described in [*stringsAsFactors: An unauthorized biography*](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng and [*stringsAsFactors = \*](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) +by Thomas Lumley. -Allowing the system locale to affect the result of a function is a subtle source of bugs when sharing code between people who work in different countries. To be clear, these defaults on rarely cause problems because most languages that share the same writing system share (most of) the same collation rules. The main exceptions tend to be European languages which have varying rules for modified letters, e.g. in Norwegian, å comes at the end of the alphabet. However, when they do cause problems they will take a long time to track down: you're unlikely to expect that the coefficients of a linear model are different[^alpha-contrast] because your code is run in a different country! +Allowing the system locale to affect the result of a function is a subtle source of bugs when sharing code between people who work in different countries. To be clear, this rarely causes problems because most languages that share the same writing system also share (most of) the same collation rules. The main exceptions tend to be European languages which have varying rules for modified letters, e.g. in Norwegian, å comes at the end of the alphabet. However, when they do cause problems they will take a long time to track down: you're unlikely to expect that the coefficients of a linear model are different[^alpha-contrast] because your code is run in a different country! -[^alpha-contrast]: You'll get different coefficients for a categorical predictor if the ordering means that a different levels comes first in the alphabet. The predictions and other diagnostics won't be affected, but you're likely to be surprised that your coefficients are different. +[^alpha-contrast]: You'll get different coefficients for a categorical predictor if the ordering means that a different level comes first in the alphabet. The predictions and other diagnostics won't be affected, but you're likely to be surprised that your coefficients are different. ## How can I remediate the problem? @@ -104,7 +105,7 @@ as.POSIXct <- function(x, tz = "") { as.POSIXct("2020-01-01 09:00") ``` -The `tz` argument is present, but it's not obvious that `""` means take from the system timezone. Let's first make that explicit: +The `tz` argument is present, but it's not obvious that `""` means to take the system timezone. Let's first make that explicit: ```{r} as.POSIXct <- function(x, tz = Sys.timezone()) {