Natural cubic splines with interpretable intercepts: ‘centres’ a basis
generated using splines::ns()
on a specified x-value. When used in a
model formula, this allows the model intercept to be interpreted with
respect to that central x-value, rather than with respect to the x-value
of the first splines::ns()
knot.
You can install splintr from github with:
# install.packages("devtools")
devtools::install_github("simisc/splintr")
library(tidyverse)
library(splines)
library(splintr)
library(broom)
library(knitr)
library(ggthemes)
library(ggrepel)
women <- women %>%
mutate(height_centred = height - mean(height))
women %>%
head() %>%
kable()
height | weight | height_centred |
---|---|---|
58 | 115 | -7 |
59 | 117 | -6 |
60 | 120 | -5 |
61 | 123 | -4 |
62 | 126 | -3 |
63 | 129 | -2 |
Different parametrisations of the same model:
fit0
: usingns
with raw heightsfit1
: usingns
with centred heightsfit2
: usingsplintr
with centred heightsfit3
: usingsplintr
with raw heights and explicit (arbitrary) centre
model_formulae <- list(
fit0 = weight ~ ns(height, df = 3),
fit1 = weight ~ ns(height_centred, df = 3),
fit2 = weight ~ splintr(height_centred, df = 3),
fit3 = weight ~ splintr(height, df = 3, centre = explicit_centre)
)
explicit_centre = 68.45
model_fits <- map(model_formulae, lm, data = women)
The models are identical:
model_fits %>%
map_dfr(glance, .id = "model") %>%
select(model, sigma, logLik, deviance) %>%
kable()
model | sigma | logLik | deviance |
---|---|---|---|
fit0 | 0.321018 | -1.914046 | 1.133578 |
fit1 | 0.321018 | -1.914046 | 1.133578 |
fit2 | 0.321018 | -1.914046 | 1.133578 |
fit3 | 0.321018 | -1.914046 | 1.133578 |
But they have different intercepts:
model_fits %>%
map_dfr(~tidy(.x, quick = TRUE), .id = "model") %>%
filter(term == "(Intercept)") %>%
select(model, estimate, std.error, statistic) %>%
kable()
model | estimate | std.error | statistic |
---|---|---|---|
fit0 | 114.5595 | 0.2455372 | 466.5666 |
fit1 | 114.5595 | 0.2455372 | 466.5666 |
fit2 | 135.1067 | 0.1288238 | 1048.7711 |
fit3 | 147.8057 | 0.1513002 | 976.9034 |
Using splines::ns()
, the model intercept represents the estimated
value of weight
at the first boundary knot, i.e. when height
takes
its minimum value of 58. This is unchanged by centring the predictor, so
fit0
and fit1
have identical intercepts.
Using splintr()
instead, the intercept representes the estimated value
of weight
when the predictor height_centred
takes a value of zero.
An arbitrary “centre” can be specified directly in the splintr()
call.
int_pts <- tibble(
model = paste0("fit", 0:3),
x = c(
min(women$height),
min(women$height),
mean(women$height),
explicit_centre
),
y = map_dbl(model_fits, ~coef(.x)[1])
)
augment(model_fits[[1]], data = women) %>%
ggplot(aes(x = height)) +
geom_point(aes(y = weight)) +
geom_line(aes(y = .fitted), col = "blue") +
geom_point(data = int_pts, aes(x = x, y = y), col = "red") +
geom_label_repel(data = int_pts, aes(x = x, y = y, label = model),
col = "red", nudge_y = 7) +
theme_few()