Merge pull request #94 from JuliaTrustworthyAI/93-fixing-code-in-the-…

…tutorials Fixed tutorials code snippets
JuliaTrustworthyAI · Jun 15, 2024 · d7d0ad7 · d7d0ad7
2 parents 00bdfc2 + 16ed50d
commit d7d0ad7
Show file tree

Hide file tree

Showing 35 changed files with 13,393 additions and 11,289 deletions.
diff --git a/_freeze/docs/src/tutorials/logit/execute-results/md.json b/_freeze/docs/src/tutorials/logit/execute-results/md.json
@@ -1,8 +1,8 @@
 {
-  "hash": "82a273f4f8234df1eacbcd8ef1235d6c",
+  "hash": "885baddb8b06f5422ad14af1d1cccd19",
   "result": {
     "engine": "jupyter",
-    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian Logistic Regression\n\n\n\nWe will use synthetic data with linearly separable samples:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_linear(100)\nX = hcat(xs...) # bring into tabular format\ndata = zip(xs,ys)\n```\n:::\n\n\nLogistic regression with weight decay can be implemented in Flux.jl as a single dense (linear) layer with binary logit crossentropy loss:\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nnn = Chain(Dense(2,1))\nλ = 0.5\nsqnorm(x) = sum(abs2, x)\nweight_regularization(λ=λ) = 1/2 * λ^2 * sum(sqnorm, Flux.params(nn))\nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) + weight_regularization()\n```\n:::\n\n\nThe code below simply trains the model. After about 50 training epochs training loss stagnates.\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 50\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace approximation\n\nLaplace approximation for the posterior predictive can be implemented as follows:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, λ=λ, subset_of_weights=:last_layer)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n\n\n",
+    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian Logistic Regression\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nWe will use synthetic data with linearly separable samples:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_linear(100)\nX = hcat(xs...) # bring into tabular format\ndata = zip(xs,ys)\n```\n:::\n\n\n## Model\n\nLogistic regression with weight decay can be implemented in Flux.jl as a single dense (linear) layer with binary logit crossentropy loss:\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nnn = Chain(Dense(2,1))\nλ = 0.5\nsqnorm(x) = sum(abs2, x)\nweight_regularization(λ=λ) = 1/2 * λ^2 * sum(sqnorm, Flux.params(nn))\nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) + weight_regularization()\n```\n:::\n\n\nThe code below simply trains the model. After about 50 training epochs training loss stagnates.\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 50\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace approximation\n\nLaplace approximation for the posterior predictive can be implemented as follows:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, λ=λ, subset_of_weights=:last_layer)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nzoom = 0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n:::\n\n\n",
     "supporting": [
       "logit_files"
     ],

diff --git a/_freeze/docs/src/tutorials/logit/figure-commonmark/cell-output-1.svg b/_freeze/docs/src/tutorials/logit/figure-commonmark/cell-output-1.svg
diff --git a/_freeze/docs/src/tutorials/mlp/execute-results/md.json b/_freeze/docs/src/tutorials/mlp/execute-results/md.json
@@ -1,10 +1,10 @@
 {
-  "hash": "aabc49848897fcaeba97ed162f03b881",
+  "hash": "979c6c2ea309744c0cf978cf3346899a",
   "result": {
     "engine": "jupyter",
-    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian MLP\n\n\n\nThis time we use a synthetic dataset containing samples that are not linearly separable:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_non_linear(200)\nX = hcat(xs...) # bring into tabular format\ndata = zip(xs,ys)\n```\n:::\n\n\nFor the classification task we build a neural network with weight decay composed of a single hidden layer.\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nn_hidden = 10\nD = size(X,1)\nnn = Chain(\n    Dense(D, n_hidden, σ),\n    Dense(n_hidden, 1)\n)  \nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) \n```\n:::\n\n\nThe model is trained until training loss stagnates.\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam(1e-3)\nepochs = 100\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace Approximation\n\nLaplace approximation can be implemented as follows:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, subset_of_weights=:all)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\n# Plot the posterior distribution with a contour plot.\nzoom=0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n![](mlp_files/figure-commonmark/cell-7-output-1.svg){}\n:::\n:::\n\n\nZooming out we can note that the plugin estimator produces high-confidence estimates in regions scarce of any samples. The Laplace approximation is much more conservative about these regions.\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom=-50\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](mlp_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\n",
+    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian MLP\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nThis time we use a synthetic dataset containing samples that are not linearly separable:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_non_linear(200)\nX = hcat(xs...) # bring into tabular format\ndata = zip(xs,ys)\n```\n:::\n\n\n## Model\nFor the classification task we build a neural network with weight decay composed of a single hidden layer.\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nn_hidden = 10\nD = size(X,1)\nnn = Chain(\n    Dense(D, n_hidden, σ),\n    Dense(n_hidden, 1)\n)  \nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) \n```\n:::\n\n\nThe model is trained until training loss stagnates.\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam(1e-3)\nepochs = 100\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace Approximation\n\nLaplace approximation can be implemented as follows:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, subset_of_weights=:all)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\n# Plot the posterior distribution with a contour plot.\nzoom=0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n![](mlp_files/figure-commonmark/cell-7-output-1.svg){}\n:::\n:::\n\n\nZooming out we can note that the plugin estimator produces high-confidence estimates in regions scarce of any samples. The Laplace approximation is much more conservative about these regions.\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom=-50\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](mlp_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\n",
     "supporting": [
-      "mlp_files"
+      "mlp_files\\figure-commonmark"
     ],
     "filters": []
   }

diff --git a/_freeze/docs/src/tutorials/mlp/figure-commonmark/cell-7-output-1.svg b/_freeze/docs/src/tutorials/mlp/figure-commonmark/cell-7-output-1.svg
diff --git a/_freeze/docs/src/tutorials/mlp/figure-commonmark/cell-8-output-1.svg b/_freeze/docs/src/tutorials/mlp/figure-commonmark/cell-8-output-1.svg
diff --git a/_freeze/docs/src/tutorials/multi/execute-results/md.json b/_freeze/docs/src/tutorials/multi/execute-results/md.json
@@ -1,10 +1,10 @@
 {
-  "hash": "989e65afa56b460667f2aecfad7f1ce4",
+  "hash": "1f8a1e12de094fb4dac9c6a4336bb51d",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Multi-class problem\n---\n\n\n\n\n\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\nusing LaplaceRedux.Data\nx, y = Data.toy_data_multi()\nX = hcat(x...)\ny_train = Flux.onehotbatch(y, unique(y))\ny_train = Flux.unstack(y_train',1)\n```\n:::\n\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\ndata = zip(x,y_train)\nn_hidden = 3\nD = size(X,1)\nout_dim = length(unique(y))\nnn = Chain(\n    Dense(D, n_hidden, σ),\n    Dense(n_hidden, out_dim)\n)  \nloss(x, y) = Flux.Losses.logitcrossentropy(nn(x), y)\n```\n:::\n\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 100\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n    for d in data\n        gs = gradient(Flux.params(nn)) do\n            l = loss(d...)\n        end\n        update!(opt, Flux.params(nn), gs)\n    end\n    if epoch % show_every == 0\n        println(\"Epoch \" * string(epoch))\n        @show avg_loss(data)\n    end\nend\n```\n:::\n\n\n## Laplace Approximation\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification)\nfit!(la, data)\noptimize_prior!(la; verbose=true, n_steps=100)\n```\n:::\n\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\n_labels = sort(unique(y))\nplt_list = []\nfor target in _labels\n    plt = plot(la, X, y; target=target, clim=(0,1))\n    push!(plt_list, plt)\nend\nplot(plt_list...)\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n![](multi_files/figure-commonmark/cell-7-output-1.svg){}\n:::\n:::\n\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\n_labels = sort(unique(y))\nplt_list = []\nfor target in _labels\n    plt = plot(la, X, y; target=target, clim=(0,1), link_approx=:plugin)\n    push!(plt_list, plt)\nend\nplot(plt_list...)\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](multi_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\n",
+    "markdown": "---\ntitle: Multi-class problem\n---\n\n\n\n## Libraries\n\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux\ntheme(:lime)\n```\n:::\n\n\n## Data\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\nusing LaplaceRedux.Data\nx, y = Data.toy_data_multi()\nX = hcat(x...)\ny_train = Flux.onehotbatch(y, unique(y))\ny_train = Flux.unstack(y_train',1)\n```\n:::\n\n\n## MLP\n\nWe set up a model\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\ndata = zip(x,y_train)\nn_hidden = 3\nD = size(X,1)\nout_dim = length(unique(y))\nnn = Chain(\n    Dense(D, n_hidden, σ),\n    Dense(n_hidden, out_dim)\n)  \nloss(x, y) = Flux.Losses.logitcrossentropy(nn(x), y)\n```\n:::\n\n\ntraining:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 100\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n    for d in data\n        gs = gradient(Flux.params(nn)) do\n            l = loss(d...)\n        end\n        update!(opt, Flux.params(nn), gs)\n    end\n    if epoch % show_every == 0\n        println(\"Epoch \" * string(epoch))\n        @show avg_loss(data)\n    end\nend\n```\n:::\n\n\n## Laplace Approximation\n\nThe Laplace approximation can be implemented as follows:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification)\nfit!(la, data)\noptimize_prior!(la; verbose=true, n_steps=100)\n```\n:::\n\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\n_labels = sort(unique(y))\nplt_list = []\nfor target in _labels\n    plt = plot(la, X, y; target=target, clim=(0,1))\n    push!(plt_list, plt)\nend\nplot(plt_list...)\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n![](multi_files/figure-commonmark/cell-7-output-1.svg){}\n:::\n:::\n\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\n_labels = sort(unique(y))\nplt_list = []\nfor target in _labels\n    plt = plot(la, X, y; target=target, clim=(0,1), link_approx=:plugin)\n    push!(plt_list, plt)\nend\nplot(plt_list...)\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](multi_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\n",
     "supporting": [
-      "multi_files"
+      "multi_files\\figure-commonmark"
     ],
     "filters": []
   }