Skip to content

Commit 2f47f53

Browse files
committed
fd-update
1 parent 5e8a3c9 commit 2f47f53

File tree

1,159 files changed

+84752
-2615
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,159 files changed

+84752
-2615
lines changed
9.73 KB
Binary file not shown.

__site/__generated/A-composing-models/Manifest.toml

Lines changed: 643 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
[deps]
2+
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
3+
NearestNeighborModels = "636a865e-7cf4-491e-846c-de09b730eb36"
4+
PrettyPrinting = "54e16d92-306c-5ea0-a30b-337be88ac337"
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
using MLJ
2+
using PrettyPrinting
3+
4+
KNNRegressor = @load KNNRegressor
5+
6+
X = (age = [23, 45, 34, 25, 67],
7+
gender = categorical(['m', 'm', 'f', 'm', 'f']))
8+
9+
height = [178, 194, 165, 173, 168];
10+
11+
scitype(X.age)
12+
13+
pipe = @pipeline(
14+
X -> coerce(X, :age=>Continuous),
15+
OneHotEncoder(),
16+
KNNRegressor(K=3),
17+
target = UnivariateStandardizer());
18+
19+
pipe.knn_regressor.K = 2
20+
pipe.one_hot_encoder.drop_last = true;
21+
22+
evaluate(pipe, X, height, resampling=Holdout(),
23+
measure=rms) |> pprint
24+
25+
# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl
26+
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"source": [
6+
"Before running this, please make sure to activate and instantiate the\n",
7+
"environment with [this `Project.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/A-composing-models/Project.toml) and\n",
8+
"[this `Manifest.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/A-composing-models/Manifest.toml).\n",
9+
"For instance, copy these files to a folder 'A-composing-models', `cd` to it and\n",
10+
"\n",
11+
"```julia\n",
12+
"using Pkg; Pkg.activate(\".\"); Pkg.instantiate()\n",
13+
"```"
14+
],
15+
"metadata": {}
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"source": [
20+
"## Generating dummy data\n",
21+
"Let's start by generating some dummy data with both numerical values and categorical values:"
22+
],
23+
"metadata": {}
24+
},
25+
{
26+
"outputs": [],
27+
"cell_type": "code",
28+
"source": [
29+
"using MLJ\n",
30+
"using PrettyPrinting\n",
31+
"\n",
32+
"KNNRegressor = @load KNNRegressor"
33+
],
34+
"metadata": {},
35+
"execution_count": null
36+
},
37+
{
38+
"cell_type": "markdown",
39+
"source": [
40+
"input"
41+
],
42+
"metadata": {}
43+
},
44+
{
45+
"outputs": [],
46+
"cell_type": "code",
47+
"source": [
48+
"X = (age = [23, 45, 34, 25, 67],\n",
49+
" gender = categorical(['m', 'm', 'f', 'm', 'f']))"
50+
],
51+
"metadata": {},
52+
"execution_count": null
53+
},
54+
{
55+
"cell_type": "markdown",
56+
"source": [
57+
"target"
58+
],
59+
"metadata": {}
60+
},
61+
{
62+
"outputs": [],
63+
"cell_type": "code",
64+
"source": [
65+
"height = [178, 194, 165, 173, 168];"
66+
],
67+
"metadata": {},
68+
"execution_count": null
69+
},
70+
{
71+
"cell_type": "markdown",
72+
"source": [
73+
"Note that the scientific type of `age` is `Count` here:"
74+
],
75+
"metadata": {}
76+
},
77+
{
78+
"outputs": [],
79+
"cell_type": "code",
80+
"source": [
81+
"scitype(X.age)"
82+
],
83+
"metadata": {},
84+
"execution_count": null
85+
},
86+
{
87+
"cell_type": "markdown",
88+
"source": [
89+
"We will want to coerce that to `Continuous` so that it can be given to a regressor that expects such values."
90+
],
91+
"metadata": {}
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"source": [
96+
"## Declaring a pipeline"
97+
],
98+
"metadata": {}
99+
},
100+
{
101+
"cell_type": "markdown",
102+
"source": [
103+
"A typical workflow for such data is to one-hot-encode the categorical data and then apply some regression model on the data.\n",
104+
"Let's say that we want to apply the following steps:\n",
105+
"1. standardize the target variable (`:height`)\n",
106+
"1. one hot encode the categorical data\n",
107+
"1. train a KNN regression model"
108+
],
109+
"metadata": {}
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"source": [
114+
"The `@pipeline` macro helps you define such a simple (non-branching) pipeline of steps to be applied in order:"
115+
],
116+
"metadata": {}
117+
},
118+
{
119+
"outputs": [],
120+
"cell_type": "code",
121+
"source": [
122+
"pipe = @pipeline(\n",
123+
" X -> coerce(X, :age=>Continuous),\n",
124+
" OneHotEncoder(),\n",
125+
" KNNRegressor(K=3),\n",
126+
" target = UnivariateStandardizer());"
127+
],
128+
"metadata": {},
129+
"execution_count": null
130+
},
131+
{
132+
"cell_type": "markdown",
133+
"source": [
134+
"Note the coercion of the `:age` variable to Continuous since `KNNRegressor` expects `Continuous` input.\n",
135+
"Note also the `target` keyword where you can specify a transformation of the target variable."
136+
],
137+
"metadata": {}
138+
},
139+
{
140+
"cell_type": "markdown",
141+
"source": [
142+
"Hyperparameters of this pipeline can be accessed (and set) using dot syntax:"
143+
],
144+
"metadata": {}
145+
},
146+
{
147+
"outputs": [],
148+
"cell_type": "code",
149+
"source": [
150+
"pipe.knn_regressor.K = 2\n",
151+
"pipe.one_hot_encoder.drop_last = true;"
152+
],
153+
"metadata": {},
154+
"execution_count": null
155+
},
156+
{
157+
"cell_type": "markdown",
158+
"source": [
159+
"Evaluation for a pipe can be done with the `evaluate!` method; implicitly it will construct machines that will contain the fitted parameters etc:"
160+
],
161+
"metadata": {}
162+
},
163+
{
164+
"outputs": [],
165+
"cell_type": "code",
166+
"source": [
167+
"evaluate(pipe, X, height, resampling=Holdout(),\n",
168+
" measure=rms) |> pprint"
169+
],
170+
"metadata": {},
171+
"execution_count": null
172+
},
173+
{
174+
"cell_type": "markdown",
175+
"source": [
176+
"---\n",
177+
"\n",
178+
"*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*"
179+
],
180+
"metadata": {}
181+
}
182+
],
183+
"nbformat_minor": 3,
184+
"metadata": {
185+
"language_info": {
186+
"file_extension": ".jl",
187+
"mimetype": "application/julia",
188+
"name": "julia",
189+
"version": "1.7.1"
190+
},
191+
"kernelspec": {
192+
"name": "julia-1.7",
193+
"display_name": "Julia 1.7.1",
194+
"language": "julia"
195+
}
196+
},
197+
"nbformat": 4
198+
}
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Before running this, please make sure to activate and instantiate the
2+
# environment with [this `Project.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/A-composing-models/Project.toml) and
3+
# [this `Manifest.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/A-composing-models/Manifest.toml).
4+
# For instance, copy these files to a folder 'A-composing-models', `cd` to it and
5+
#
6+
# ```julia
7+
# using Pkg; Pkg.activate("."); Pkg.instantiate()
8+
# ```
9+
10+
# ## Generating dummy data
11+
# Let's start by generating some dummy data with both numerical values and categorical values:
12+
13+
using MLJ
14+
using PrettyPrinting
15+
16+
KNNRegressor = @load KNNRegressor
17+
18+
# input
19+
20+
X = (age = [23, 45, 34, 25, 67],
21+
gender = categorical(['m', 'm', 'f', 'm', 'f']))
22+
23+
# target
24+
25+
height = [178, 194, 165, 173, 168];
26+
27+
# Note that the scientific type of `age` is `Count` here:
28+
29+
scitype(X.age)
30+
31+
# We will want to coerce that to `Continuous` so that it can be given to a regressor that expects such values.
32+
33+
# ## Declaring a pipeline
34+
35+
# A typical workflow for such data is to one-hot-encode the categorical data and then apply some regression model on the data.
36+
# Let's say that we want to apply the following steps:
37+
# 1. standardize the target variable (`:height`)
38+
# 1. one hot encode the categorical data
39+
# 1. train a KNN regression model
40+
41+
# The `@pipeline` macro helps you define such a simple (non-branching) pipeline of steps to be applied in order:
42+
43+
pipe = @pipeline(
44+
X -> coerce(X, :age=>Continuous),
45+
OneHotEncoder(),
46+
KNNRegressor(K=3),
47+
target = UnivariateStandardizer());
48+
49+
# Note the coercion of the `:age` variable to Continuous since `KNNRegressor` expects `Continuous` input.
50+
# Note also the `target` keyword where you can specify a transformation of the target variable.
51+
52+
# Hyperparameters of this pipeline can be accessed (and set) using dot syntax:
53+
54+
pipe.knn_regressor.K = 2
55+
pipe.one_hot_encoder.drop_last = true;
56+
57+
# Evaluation for a pipe can be done with the `evaluate!` method; implicitly it will construct machines that will contain the fitted parameters etc:
58+
59+
evaluate(pipe, X, height, resampling=Holdout(),
60+
measure=rms) |> pprint
61+
62+
# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl
63+
11.4 KB
Binary file not shown.

0 commit comments

Comments
 (0)