JuliaAI
diff --git a/‎__site/__generated/A-composing-models.tar.gz
9.73 KB b/‎__site/__generated/A-composing-models.tar.gz
9.73 KB
diff --git a/‎__site/__generated/A-composing-models/Manifest.toml
Lines changed: 643 additions & 0 deletions b/‎__site/__generated/A-composing-models/Manifest.toml
Lines changed: 643 additions & 0 deletions
diff --git a/‎__site/__generated/A-composing-models/Project.toml
Lines changed: 4 additions & 0 deletions b/‎__site/__generated/A-composing-models/Project.toml
Lines changed: 4 additions & 0 deletions
diff --git a/‎__site/__generated/A-composing-models/tutorial-raw.jl
Lines changed: 26 additions & 0 deletions b/‎__site/__generated/A-composing-models/tutorial-raw.jl
Lines changed: 26 additions & 0 deletions
diff --git a/‎__site/__generated/A-composing-models/tutorial.ipynb
Lines changed: 198 additions & 0 deletions b/‎__site/__generated/A-composing-models/tutorial.ipynb
Lines changed: 198 additions & 0 deletions
diff --git a/‎__site/__generated/A-composing-models/tutorial.jl
Lines changed: 63 additions & 0 deletions b/‎__site/__generated/A-composing-models/tutorial.jl
Lines changed: 63 additions & 0 deletions
diff --git a/‎__site/__generated/A-ensembles-2.tar.gz
11.4 KB b/‎__site/__generated/A-ensembles-2.tar.gz
11.4 KB
@@ -0,0 +1,4 @@
+[deps]
+MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
+NearestNeighborModels = "636a865e-7cf4-491e-846c-de09b730eb36"
+PrettyPrinting = "54e16d92-306c-5ea0-a30b-337be88ac337"
@@ -0,0 +1,26 @@
+using MLJ
+using PrettyPrinting
+
+KNNRegressor = @load KNNRegressor
+
+X = (age    = [23, 45, 34, 25, 67],
+     gender = categorical(['m', 'm', 'f', 'm', 'f']))
+
+height = [178, 194, 165, 173, 168];
+
+scitype(X.age)
+
+pipe = @pipeline(
+    X -> coerce(X, :age=>Continuous),
+    OneHotEncoder(),
+    KNNRegressor(K=3),
+    target = UnivariateStandardizer());
+
+pipe.knn_regressor.K = 2
+pipe.one_hot_encoder.drop_last = true;
+
+evaluate(pipe, X, height, resampling=Holdout(),
+         measure=rms) |> pprint
+
+# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl
+
@@ -0,0 +1,198 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Before running this, please make sure to activate and instantiate the\n",
+    "environment with [this `Project.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/A-composing-models/Project.toml) and\n",
+    "[this `Manifest.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/A-composing-models/Manifest.toml).\n",
+    "For instance, copy these files to a folder 'A-composing-models', `cd` to it and\n",
+    "\n",
+    "```julia\n",
+    "using Pkg; Pkg.activate(\".\"); Pkg.instantiate()\n",
+    "```"
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Generating dummy data\n",
+    "Let's start by generating some dummy data with both numerical values and categorical values:"
+   ],
+   "metadata": {}
+  },
+  {
+   "outputs": [],
+   "cell_type": "code",
+   "source": [
+    "using MLJ\n",
+    "using PrettyPrinting\n",
+    "\n",
+    "KNNRegressor = @load KNNRegressor"
+   ],
+   "metadata": {},
+   "execution_count": null
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "input"
+   ],
+   "metadata": {}
+  },
+  {
+   "outputs": [],
+   "cell_type": "code",
+   "source": [
+    "X = (age    = [23, 45, 34, 25, 67],\n",
+    "     gender = categorical(['m', 'm', 'f', 'm', 'f']))"
+   ],
+   "metadata": {},
+   "execution_count": null
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "target"
+   ],
+   "metadata": {}
+  },
+  {
+   "outputs": [],
+   "cell_type": "code",
+   "source": [
+    "height = [178, 194, 165, 173, 168];"
+   ],
+   "metadata": {},
+   "execution_count": null
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Note that the scientific type of `age` is `Count` here:"
+   ],
+   "metadata": {}
+  },
+  {
+   "outputs": [],
+   "cell_type": "code",
+   "source": [
+    "scitype(X.age)"
+   ],
+   "metadata": {},
+   "execution_count": null
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "We will want to coerce that to `Continuous` so that it can be given to a regressor that expects such values."
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Declaring a pipeline"
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "A typical workflow for such data is to one-hot-encode the categorical data and then apply some regression model on the data.\n",
+    "Let's say that we want to apply the following steps:\n",
+    "1. standardize the target variable (`:height`)\n",
+    "1. one hot encode the categorical data\n",
+    "1. train a KNN regression model"
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "The `@pipeline` macro helps you define such a simple (non-branching) pipeline of steps to be applied in order:"
+   ],
+   "metadata": {}
+  },
+  {
+   "outputs": [],
+   "cell_type": "code",
+   "source": [
+    "pipe = @pipeline(\n",
+    "    X -> coerce(X, :age=>Continuous),\n",
+    "    OneHotEncoder(),\n",
+    "    KNNRegressor(K=3),\n",
+    "    target = UnivariateStandardizer());"
+   ],
+   "metadata": {},
+   "execution_count": null
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Note the coercion of the `:age` variable to Continuous since `KNNRegressor` expects `Continuous` input.\n",
+    "Note also the `target` keyword where you can specify a transformation of the target variable."
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Hyperparameters of this pipeline can be accessed (and set) using dot syntax:"
+   ],
+   "metadata": {}
+  },
+  {
+   "outputs": [],
+   "cell_type": "code",
+   "source": [
+    "pipe.knn_regressor.K = 2\n",
+    "pipe.one_hot_encoder.drop_last = true;"
+   ],
+   "metadata": {},
+   "execution_count": null
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Evaluation for a pipe can be done with the `evaluate!` method; implicitly it will construct machines that will contain the fitted parameters etc:"
+   ],
+   "metadata": {}
+  },
+  {
+   "outputs": [],
+   "cell_type": "code",
+   "source": [
+    "evaluate(pipe, X, height, resampling=Holdout(),\n",
+    "         measure=rms) |> pprint"
+   ],
+   "metadata": {},
+   "execution_count": null
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "---\n",
+    "\n",
+    "*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*"
+   ],
+   "metadata": {}
+  }
+ ],
+ "nbformat_minor": 3,
+ "metadata": {
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.7.1"
+  },
+  "kernelspec": {
+   "name": "julia-1.7",
+   "display_name": "Julia 1.7.1",
+   "language": "julia"
+  }
+ },
+ "nbformat": 4
+}
@@ -0,0 +1,63 @@
+# Before running this, please make sure to activate and instantiate the
+# environment with [this `Project.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/A-composing-models/Project.toml) and
+# [this `Manifest.toml`](https://raw.githubusercontent.com/juliaai/DataScienceTutorials.jl/gh-pages/__generated/A-composing-models/Manifest.toml).
+# For instance, copy these files to a folder 'A-composing-models', `cd` to it and
+#
+# ```julia
+# using Pkg; Pkg.activate("."); Pkg.instantiate()
+# ```
+
+# ## Generating dummy data
+# Let's start by generating some dummy data with both numerical values and categorical values:
+
+using MLJ
+using PrettyPrinting
+
+KNNRegressor = @load KNNRegressor
+
+# input
+
+X = (age    = [23, 45, 34, 25, 67],
+     gender = categorical(['m', 'm', 'f', 'm', 'f']))
+
+# target
+
+height = [178, 194, 165, 173, 168];
+
+# Note that the scientific type of `age` is `Count` here:
+
+scitype(X.age)
+
+# We will want to coerce that to `Continuous` so that it can be given to a regressor that expects such values.
+
+# ## Declaring a pipeline
+
+# A typical workflow for such data is to one-hot-encode the categorical data and then apply some regression model on the data.
+# Let's say that we want to apply the following steps:
+# 1. standardize the target variable (`:height`)
+# 1. one hot encode the categorical data
+# 1. train a KNN regression model
+
+# The `@pipeline` macro helps you define such a simple (non-branching) pipeline of steps to be applied in order:
+
+pipe = @pipeline(
+    X -> coerce(X, :age=>Continuous),
+    OneHotEncoder(),
+    KNNRegressor(K=3),
+    target = UnivariateStandardizer());
+
+# Note the coercion of the `:age` variable to Continuous since `KNNRegressor` expects `Continuous` input.
+# Note also the `target` keyword where you can specify a transformation of the target variable.
+
+# Hyperparameters of this pipeline can be accessed (and set) using dot syntax:
+
+pipe.knn_regressor.K = 2
+pipe.one_hot_encoder.drop_last = true;
+
+# Evaluation for a pipe can be done with the `evaluate!` method; implicitly it will construct machines that will contain the fitted parameters etc:
+
+evaluate(pipe, X, height, resampling=Holdout(),
+         measure=rms) |> pprint
+
+# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl
+