-
Notifications
You must be signed in to change notification settings - Fork 74
/
Copy pathREADME.rmd
109 lines (87 loc) · 5.54 KB
/
README.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
echo = TRUE,
warning = FALSE,
message = FALSE,
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# caretEnsemble
<!-- badges: start -->
[](https://cran.r-project.org/package=caretEnsemble)
[](https://cran.r-project.org/web/checks/check_results_caretEnsemble.html)
[](https://www.rdocumentation.org/packages/caretEnsemble)
[](https://github.com/zachmayer/caretEnsemble/actions/workflows/R-CMD-check.yaml)
[](https://github.com/zachmayer/caretEnsemble/actions/workflows/tests.yaml)
[](https://codecov.io/gh/zachmayer/caretEnsemble)
[](https://www.codefactor.io/repository/github/zachmayer/caretEnsemble)
[](https://zachmayer.r-universe.dev/caretEnsemble)
[](https://github.com/zachmayer/caretEnsemble)
[](https://badges.mit-license.org)
[](https://discord.gg/zgPeSm8a3u)
<!-- badges: end -->
caretEnsemble is a framework for [stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking) models fit with the [caret](https://topepo.github.io/caret/) package.
Use `caretList` to fit multiple models, and then use `caretStack` to stack them with another caret model.
First, use caretList to fit many models to the same data:
```{r}
set.seed(42L)
data(diamonds, package = "ggplot2")
dat <- data.table::data.table(diamonds)
dat <- dat[sample.int(nrow(diamonds), 500L), ]
models <- caretEnsemble::caretList(
price ~ .,
data = dat,
methodList = c("rf", "glmnet")
)
print(summary(models))
```
Then, use caretEnsemble to make a greedy ensemble of these models
```{r}
greedy_stack <- caretEnsemble::caretEnsemble(models)
print(greedy_stack)
```
You can also use caretStack to make a non-linear ensemble
```{r}
rf_stack <- caretEnsemble::caretStack(models, method = "rf")
print(rf_stack)
```
Use autoplot from ggplot2 to plot ensemble diagnostics:
```{r greedy-stack-6-plot, fig.alt="6 panel plot of an ensemble of models fit to the diamonds dataset. The RF model is the best and has the highest weight. The residual plots look good. RMSE is about `r round(min(greedy_stack$ens_model$results$RMSE))`."}
ggplot2::autoplot(greedy_stack, training_data = dat, xvars = c("carat", "table"))
```
```{r, fig.alt="6 panel plot of an ensemble of models fit to the diamonds dataset. The RF model is the best and has the highest weight. The residual plots look good. RMSE is about `r round(min(rf_stack$ens_model$results$RMSE))`."}
ggplot2::autoplot(rf_stack, training_data = dat, xvars = c("carat", "table"))
```
# Installation
### Install the stable version from [CRAN](https://CRAN.R-project.org/package=caretEnsemble/):
```{r, eval=FALSE}
install.packages("caretEnsemble")
```
### Install the dev version from github:
```{r, eval=FALSE}
devtools::install_github("zachmayer/caretEnsemble")
```
There are also tagged versions of caretEnsemble on github you can install via devtools. For example, to install the [previous release of caretEnsemble](https://github.com/zachmayer/caretEnsemble/releases/) use:
```{r, eval=FALSE}
devtools::install_github("zachmayer/[email protected]")
```
This is useful if the latest release breaks some aspect of your workflow. caretEnsemble is pure R with no compilation, so this command will work in a variety of environments.
# Package development
This package uses a Makefile. Use `make help` to see the supported options.
Use `make fix-style` to fix simple linting errors.
For iterating while writing code, run `make dev`. This runs just `make clean fix-style document lint spell test`, for a quicker local dev loop. Please still run `make all` before making a PR.
Use `make all` before making a pull request, which will also run R CMD CHECK and a code coverage check. This runs `make clean fix-style document install build-readme build-vignettes lint spell test check coverage preview-site`.
## First time dev setup:
run `make install` from the git repository to install the dev version of caretEnsemble, along with the necessary package dependencies.
# Inspiration and similar packages:
caretEnsemble was inspired by [medley](https://github.com/mewo2/medley), which in turn was inspired by Caruana et. al.'s (2004) paper [Ensemble Selection from Libraries of Models.](http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf)
If you want to do something similar in python, check out [vecstack](https://github.com/vecxoz/vecstack).
# Code of Conduct:
Please note that this project is released with a [Contributor Code of Conduct](https://github.com/zachmayer/caretEnsemble/blob/master/.github/CONTRIBUTING.md). By participating in this project you agree to abide by its terms.