-
Notifications
You must be signed in to change notification settings - Fork 42
/
Copy pathnew_bandit.html
401 lines (343 loc) · 40.2 KB
/
new_bandit.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
<!DOCTYPE html>
<html class="writer-html5" lang="en" data-content_root="./">
<head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Adding a New Bandit — MABWiser 2.7.4 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=80d5e7a1" />
<link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=19f00094" />
<!--[if lt IE 9]>
<script src="_static/js/html5shiv.min.js"></script>
<![endif]-->
<script src="_static/jquery.js?v=5d32c60e"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="_static/documentation_options.js?v=e8140b17"></script>
<script src="_static/doctools.js?v=888ff710"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/js/theme.js"></script>
<link rel="author" title="About these documents" href="about.html" />
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="MABWiser Public API" href="api.html" />
<link rel="prev" title="Contributing" href="contributing.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home">
MABWiser
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="about.html">About Multi-Armed Bandits</a></li>
<li class="toctree-l1"><a class="reference internal" href="installation.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="quick.html">Quick Start</a></li>
<li class="toctree-l1"><a class="reference internal" href="examples.html">Usage Examples</a></li>
<li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Adding a New Bandit</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#exposing-the-public-api">1. Exposing the Public API</a></li>
<li class="toctree-l2"><a class="reference internal" href="#implementing-the-bandit-algorithm">2. Implementing the Bandit Algorithm</a></li>
<li class="toctree-l2"><a class="reference internal" href="#testing-the-bandit-algorithm">3. Testing the Bandit Algorithm</a></li>
<li class="toctree-l2"><a class="reference internal" href="#sending-a-pull-request">4. Sending a Pull Request</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="api.html">MABWiser Public API</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">MABWiser</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active">Adding a New Bandit</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/new_bandit.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="adding-a-new-bandit">
<span id="new-bandit"></span><h1>Adding a New Bandit<a class="headerlink" href="#adding-a-new-bandit" title="Link to this heading"></a></h1>
<p>In this section, we provide high-level guidelines on how to introduce a new bandit algorithm in MABWiser.</p>
<div class="admonition-high-level-overview admonition">
<p class="admonition-title">High-Level Overview</p>
<p>Adding a new bandit algorithm to MABWiser consists of three main steps:</p>
<ol class="arabic simple">
<li><p>Exposing the new bandit policy within the Public API</p></li>
<li><p>Developing the underlying bandit algorithm</p></li>
<li><p>Testing the behavior of the bandit policy</p></li>
</ol>
<p>These steps can be followed by a Pull Request to include your new policy in the MABWiser library. In the following, the details of each of step are provided.</p>
</div>
<section id="exposing-the-public-api">
<h2>1. Exposing the Public API<a class="headerlink" href="#exposing-the-public-api" title="Link to this heading"></a></h2>
<p>Imagine you would like to introduce a new bandit algorithm, called <code class="docutils literal notranslate"><span class="pre">MyCoolPolicy</span></code>, with an hyper-parameter <code class="docutils literal notranslate"><span class="pre">my_parameter</span></code>.
Here we consider introducing a new learning policy, but the case introducing a new neighborhood policy is similar.</p>
<p>First and foremost, the users of MABWiser need to be able to access your cool bandit algorithm.
This is how it would look like in a usage example. Notice how the <code class="docutils literal notranslate"><span class="pre">mab</span></code> model is created with your new policy.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Import MABWiser Library</span>
<span class="kn">from</span> <span class="nn">mabwiser.mab</span> <span class="kn">import</span> <span class="n">MAB</span><span class="p">,</span> <span class="n">LearningPolicy</span><span class="p">,</span> <span class="n">NeighborhoodPolicy</span>
<span class="c1"># Data</span>
<span class="n">arms</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Arm1'</span><span class="p">,</span> <span class="s1">'Arm2'</span><span class="p">]</span>
<span class="n">decisions</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Arm1'</span><span class="p">,</span> <span class="s1">'Arm1'</span><span class="p">,</span> <span class="s1">'Arm2'</span><span class="p">,</span> <span class="s1">'Arm1'</span><span class="p">]</span>
<span class="n">rewards</span> <span class="o">=</span> <span class="p">[</span><span class="mi">20</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="mi">9</span><span class="p">]</span>
<span class="c1"># Model</span>
<span class="n">mab</span> <span class="o">=</span> <span class="n">MAB</span><span class="p">(</span><span class="n">arms</span><span class="p">,</span> <span class="n">LearningPolicy</span><span class="o">.</span><span class="n">MyCoolPolicy</span><span class="p">(</span><span class="n">my_parameter</span><span class="o">=</span><span class="mi">42</span><span class="p">))</span>
<span class="c1"># Train</span>
<span class="n">mab</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">decisions</span><span class="p">,</span> <span class="n">rewards</span><span class="p">)</span>
<span class="c1"># Test</span>
<span class="n">mab</span><span class="o">.</span><span class="n">predict</span><span class="p">()</span>
</pre></div>
</div>
<p>To enable public access to your bandit policy, you need the the following changes in <code class="docutils literal notranslate"><span class="pre">mab.py</span></code>:</p>
<div class="admonition important">
<p class="admonition-title">Important</p>
<p>Make sure to complete the following steps in a new feature branch created via
<code class="docutils literal notranslate"><span class="pre">`git</span> <span class="pre">checkout</span> <span class="pre">-b</span> <span class="pre">mycoolpolicy`</span></code>. Later on, you can create a pull request from this branch
to make your contributions part of the library.</p>
</div>
<ol class="loweralpha simple">
<li><p>Introduce the new bandit algorithm as an inner namedtuple class as part of the <code class="docutils literal notranslate"><span class="pre">LearningPolicy</span></code> class. See existing bandit policies as an example.</p></li>
<li><p>The parameter <code class="docutils literal notranslate"><span class="pre">my_parameter</span></code> will be a class member of this new inner class. Make sure to add type hinting. If possible, set a default value.</p></li>
<li><p>Implement the <code class="docutils literal notranslate"><span class="pre">_validate()</span></code> function for error checking on input values. If this raises errors, please document those in the <code class="docutils literal notranslate"><span class="pre">mab</span></code> constructor.</p></li>
<li><p>Add pydoc string documentation within this inner class to provide users a description of your new bandit policy. You can even use <code class="docutils literal notranslate"><span class="pre">..</span> <span class="pre">math::</span></code> to express formulas.</p></li>
<li><p>Create a doctest based on a usage example with you new policy, like the one above. See other doctests in existing policies.</p></li>
<li><p>The same idea applies if you were introducing a new <code class="docutils literal notranslate"><span class="pre">NeighborhoodPolicy</span></code>.</p></li>
</ol>
<p>You now have an entry point to your new bandit algorithm.
The next step is to connect this bandit with an <em>implementor</em> object in the constructor of the <code class="docutils literal notranslate"><span class="pre">MAB</span></code> class:</p>
<ol class="loweralpha simple">
<li><p>Go to the constructor of the <code class="docutils literal notranslate"><span class="pre">MAB</span></code> class.</p></li>
<li><p>Set the value of the <code class="docutils literal notranslate"><span class="pre">lp</span></code> to your internal implementor class, in this case <code class="docutils literal notranslate"><span class="pre">_MyCoolPolicy</span></code>.</p></li>
<li><p>Pass down the parameter <code class="docutils literal notranslate"><span class="pre">my_parameter</span></code> to the internal implementor object.</p></li>
<li><p>Make sure to update the decorator <code class="docutils literal notranslate"><span class="pre">@property</span></code> so that we can return the <code class="docutils literal notranslate"><span class="pre">learning_policy</span></code> back to the user.</p></li>
<li><p>In the <code class="docutils literal notranslate"><span class="pre">_validate_mab_args</span></code> function, register your new policy as a valid bandit to pass input validation.</p></li>
<li><p>The same idea applies if you were introducing a new <code class="docutils literal notranslate"><span class="pre">NeighborhoodPolicy</span></code>.</p></li>
</ol>
<p><strong>Congratulations!!</strong> You can now increment the version of library <code class="docutils literal notranslate"><span class="pre">__version__</span></code>.
Next, let’s move on to the implementation phase of your cool bandit algorithm!</p>
</section>
<section id="implementing-the-bandit-algorithm">
<h2>2. Implementing the Bandit Algorithm<a class="headerlink" href="#implementing-the-bandit-algorithm" title="Link to this heading"></a></h2>
<p>The previous section allowed users to access your new cool bandit policy.
What remains is to implement the learning algorithm behind your bandit policy.</p>
<p>Start by creating a new Python file named <code class="docutils literal notranslate"><span class="pre">mycoolpolicy.py</span></code> under the /mabwiser folder
to implement a class called <code class="docutils literal notranslate"><span class="pre">_MyCoolPolicy</span></code>. This is where the bandit implementation will live.</p>
<div class="admonition important">
<p class="admonition-title">Important</p>
<p>In Python, the prefix <code class="docutils literal notranslate"><span class="pre">_</span></code> in the class name denotes a private class. That is, the users do not need to directly access</p>
</div>
<p>this implementor class. Instead, they work with an immutable namedtuple object as shown in the usage example above.</p>
<p>Here is, at a high-level, what you need to implement in your bandit policy:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">_MyCoolPolicy</span><span class="p">(</span><span class="n">BaseMAB</span><span class="p">):</span>
<span class="c1"># Your new bandit class will most likely inherit from the abstract BaseMAB class.</span>
<span class="c1"># The BaseMAB is an abstract meta class which defines the public interface for all bandit algorithms.</span>
<span class="c1"># The BaseMAB dictates the function signatures of core bandit operations such as:</span>
<span class="c1"># fit(), partial_fit(), fit_arm() -- these are used during Training</span>
<span class="c1"># predict() and predict_expectations(), _predict_contexts() -- these are used during Testing</span>
<span class="c1"># and _uptake_new_arm() -- this is used as the system evolves and new arms emerge.</span>
<span class="c1"># In case your new bandit policy is similar to an existing algorithm</span>
<span class="c1"># it can take advantage of its implementation.</span>
<span class="c1"># See for example how the Popularity bandit inherits</span>
<span class="c1"># from the Greedy bandit and leverages from its training methods.</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">rng</span><span class="p">:</span> <span class="n">_BaseRNG</span><span class="p">,</span> <span class="n">arms</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">Arm</span><span class="p">],</span> <span class="n">n_jobs</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">backend</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]):</span>
<span class="c1"># The BaseMAB provides every bandit policy with:</span>
<span class="c1"># - rng: a random number generator, in case it is needed</span>
<span class="c1"># - arms: the list of arms</span>
<span class="c1"># - arm_to_expectation: the dictionary that stores the expectation of each arm</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">rng</span><span class="p">,</span> <span class="n">arms</span><span class="p">,</span> <span class="n">n_jobs</span><span class="p">,</span> <span class="n">backend</span><span class="p">)</span>
<span class="c1"># TODO:</span>
<span class="c1"># Decide what other fields your new policy might need to calculate its expectations.</span>
<span class="c1"># Declare those fields here as class members in your constructor.</span>
<span class="c1"># For example, the greedy policy needs a counter and the total sum for each arm.</span>
<span class="c1"># These fields are declared here and initialized to zero.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">my_value_to_arm</span> <span class="o">=</span> <span class="nb">dict</span><span class="o">.</span><span class="n">fromkeys</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">arms</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">decisions</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">rewards</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">contexts</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="c1"># TODO:</span>
<span class="c1"># This method trains your algorithm from scratch each time its called.</span>
<span class="c1"># You might need to reset the internal fields</span>
<span class="c1"># So that we can train from scratch with new data.</span>
<span class="n">reset</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">my_value_to_arm</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1"># Call _parallel_fit() here from the base class.</span>
<span class="c1"># This automatically activates parallelization in the training phase.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_parallel_fit</span><span class="p">(</span><span class="n">decisions</span><span class="p">,</span> <span class="n">rewards</span><span class="p">,</span> <span class="n">contexts</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">partial_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">decisions</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">rewards</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">contexts</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="c1"># This method trains your algorithm in a continuous fashion.</span>
<span class="c1"># Unlike fit() operation, the partial_fit() does not reset internal fields typically.</span>
<span class="c1"># This allows us to continue learning online</span>
<span class="c1"># Call _parallel_fit() here from the base class.</span>
<span class="c1"># This automatically activates parallelization in the training phase.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_parallel_fit</span><span class="p">(</span><span class="n">decisions</span><span class="p">,</span> <span class="n">rewards</span><span class="p">,</span> <span class="n">contexts</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">contexts</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="n">Arm</span><span class="p">:</span>
<span class="c1"># TODO:</span>
<span class="c1"># This method returns the best arm to the user according to your policy.</span>
<span class="c1"># It bases its decision on arm_to_expectation which is calculated in the _fit_arm() method.</span>
<span class="n">best_arm</span> <span class="o">=</span> <span class="o">...</span> <span class="c1"># magic goes here</span>
<span class="k">return</span> <span class="n">best_arm</span>
<span class="k">def</span> <span class="nf">predict_expectations</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">contexts</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="n">Dict</span><span class="p">[</span><span class="n">Arm</span><span class="p">,</span> <span class="n">Num</span><span class="p">]:</span>
<span class="c1"># This method returns a copy of the expectations dictionary.</span>
<span class="c1"># Make sure to return a copy of the internal object,</span>
<span class="c1"># so that the user cannot accidentally break your policy.</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">arm_to_expectation</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">warm_start</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">arm_to_features</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="n">Arm</span><span class="p">,</span> <span class="n">List</span><span class="p">[</span><span class="n">Num</span><span class="p">]],</span> <span class="n">distance_quantile</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="c1"># This method warm starts untrained (cold) arms for which no decisions has been observed</span>
<span class="c1"># A cold arm is warm started using a warm arm that is within some minimum distance from the cold arm</span>
<span class="c1"># based on the given arm_to_features and distance_quantile inputs.</span>
<span class="c1"># Calculate pairwise distances between arms and determine cold arm to warm arm mapping</span>
<span class="c1"># Call _copy_arms from the base class.</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">warm_start</span><span class="p">(</span><span class="n">arm_to_features</span><span class="p">,</span> <span class="n">distance_quantile</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_copy_arms</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cold_arm_to_warm_arm</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="n">Arm</span><span class="p">,</span> <span class="n">Arm</span><span class="p">])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="c1"># TODO:</span>
<span class="c1"># This method tells the policy how to warm start cold arms, given a cold arm to warm arm mapping.</span>
<span class="c1"># It will typically involve copying attributes from a warm arm to a cold arm, e.g.</span>
<span class="k">for</span> <span class="n">cold_arm</span><span class="p">,</span> <span class="n">warm_arm</span> <span class="ow">in</span> <span class="n">cold_arm_to_warm_arm</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="bp">self</span><span class="o">.</span><span class="n">my_value_to_arm</span><span class="p">[</span><span class="n">cold_arm</span><span class="p">]</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">my_value_to_arm</span><span class="p">[</span><span class="n">cold_arm</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">_fit_arm</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">arm</span><span class="p">:</span> <span class="n">Arm</span><span class="p">,</span> <span class="n">decisions</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">rewards</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">contexts</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">):</span>
<span class="c1"># TODO:</span>
<span class="c1"># This is the MOST IMPORTANT function to implement.</span>
<span class="c1"># This method is for the algorithm behind your bandit policy on how it trains for each arm.</span>
<span class="c1"># Based on the given input decisions and rewards,</span>
<span class="c1"># This function calculates arm_to_expectation</span>
<span class="bp">self</span><span class="o">.</span><span class="n">arm_to_expectation</span> <span class="o">=</span> <span class="o">...</span> <span class="c1"># magic goes here</span>
<span class="k">def</span> <span class="nf">_predict_contexts</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">contexts</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="n">is_predict</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span>
<span class="n">seeds</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">start_index</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="n">List</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">def</span> <span class="nf">_uptake_new_arm</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">arm</span><span class="p">:</span> <span class="n">Arm</span><span class="p">,</span> <span class="n">binarizer</span><span class="p">:</span> <span class="n">Callable</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">scaler</span><span class="p">:</span> <span class="n">Callable</span> <span class="o">=</span> <span class="kc">None</span><span class="p">):</span>
<span class="c1"># TODO:</span>
<span class="c1"># This method is called when add_arm() method is used to introduce new arms.</span>
<span class="c1"># if you have declared additional fields in the constructor</span>
<span class="c1"># Make sure that the new arms has these fields too</span>
<span class="bp">self</span><span class="o">.</span><span class="n">my_value_to_arm</span><span class="p">[</span><span class="n">arm</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
</pre></div>
</div>
<p><strong>Congratulations!!</strong> You have now implemented your cool new bandit policy. Next, let’s move onto running this for real!</p>
</section>
<section id="testing-the-bandit-algorithm">
<h2>3. Testing the Bandit Algorithm<a class="headerlink" href="#testing-the-bandit-algorithm" title="Link to this heading"></a></h2>
<p>The previous sections introduced the new bandit algorithm to the public API and implemented the underlying policy.
What remains to be seen is to use this new algorithm and assess how it performs in action.</p>
<p>Start by creating a new Python file called <code class="docutils literal notranslate"><span class="pre">test_mycoolbandit.py</span></code> under the /tests folder to implement a class called <code class="docutils literal notranslate"><span class="pre">MyCoolBanditTest</span></code>.
This class inherits from the <code class="docutils literal notranslate"><span class="pre">BaseTest</span></code> class which extends the <code class="docutils literal notranslate"><span class="pre">unittest</span></code> framework.</p>
<p>This is where we will implement unit tests to make sure our new bandit policy performs as expected.
Every test starts with the <code class="docutils literal notranslate"><span class="pre">test_</span></code> prefix followed by some descriptive name.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">tests.test_base</span> <span class="kn">import</span> <span class="n">BaseTest</span>
<span class="k">class</span> <span class="nc">MyCoolBanditTest</span><span class="p">(</span><span class="n">BaseTest</span><span class="p">):</span>
<span class="c1"># First, implement a simple case using the Public API you created in the first section</span>
<span class="c1"># Utilize the self.predict() utility wrapper method from base test to create test cases quickly</span>
<span class="c1"># When is_predict flag is set to True it returns the predicted arm</span>
<span class="k">def</span> <span class="nf">test_simple_usecase_arm</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">arm</span><span class="p">,</span> <span class="n">mab</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">arms</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
<span class="n">decisions</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
<span class="n">rewards</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="n">learning_policy</span><span class="o">=</span><span class="n">LearningPolicy</span><span class="o">.</span><span class="n">_MyCoolPolicy</span><span class="p">(),</span>
<span class="n">seed</span><span class="o">=</span><span class="mi">123456</span><span class="p">,</span>
<span class="n">num_run</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">is_predict</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># Assert the predicted arm</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">arm</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1"># When is_predict flag is set to False it returns the arm_to_prediction</span>
<span class="k">def</span> <span class="nf">test_simple_usecase_expectation</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">exp</span><span class="p">,</span> <span class="n">mab</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">arms</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
<span class="n">decisions</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
<span class="n">rewards</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="n">learning_policy</span><span class="o">=</span><span class="n">LearningPolicy</span><span class="o">.</span><span class="n">_MyCoolPolicy</span><span class="p">(),</span>
<span class="n">seed</span><span class="o">=</span><span class="mi">123456</span><span class="p">,</span>
<span class="n">num_run</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">is_predict</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="c1"># Assert the arm expectations</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertDictEqual</span><span class="p">({</span><span class="mi">1</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">:</span><span class="mi">0</span><span class="p">},</span> <span class="n">exp</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_zero_rewards</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test zero/negative rewards</span>
<span class="k">def</span> <span class="nf">test_my_parameter</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test how you parameters such as my_parameter</span>
<span class="c1"># effect the behavior of your policy</span>
<span class="k">def</span> <span class="nf">test_within_neighborhood_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test your new learning policy within a</span>
<span class="c1"># neighborhood policy when contexts are available.</span>
<span class="k">def</span> <span class="nf">test_fit_twice</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test for two successive fit operation</span>
<span class="c1"># Assert that training from scratch is done properly</span>
<span class="k">def</span> <span class="nf">test_partial_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test for one fit operation followed by partial_fit operation</span>
<span class="c1"># Assert that online training is done properly</span>
<span class="k">def</span> <span class="nf">test_unused_arm</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test the case when an arm remains unused</span>
<span class="c1"># Or when an arm has no corresponding decision or reward</span>
<span class="k">def</span> <span class="nf">test_add_new_arm</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test adding a new arm and assert that it is handled properly</span>
<span class="k">def</span> <span class="nf">test_remove_existing_arm</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test removing an arm and assert that it is handled properly</span>
<span class="k">def</span> <span class="nf">test_parallelization</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test how parallelization behaves for your new bandit using the n_jobs param</span>
<span class="k">def</span> <span class="nf">test_warm_start</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test warm start can be executed</span>
<span class="c1"># Assert warm start behavior is as expected</span>
<span class="k">def</span> <span class="nf">test_input_types</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Test different input types such as</span>
<span class="c1"># strings for arms, data series or numpy arrays for decisions and rewards</span>
</pre></div>
</div>
<p>To strengthen your test suite, consider other unittests with different number of arms,
decisions and rewards to assert that your bandit behaves correctly.</p>
<p>Add corresponding unittests in <code class="docutils literal notranslate"><span class="pre">test_invalid.py</span></code> to validate the parameters that are passed to your <code class="docutils literal notranslate"><span class="pre">_MyCoolPolicy</span></code> class.</p>
<p><strong>Congratulations!!</strong> You are now ready to share your new cool policy with everyone. Next, let’s send a pull request for code review.</p>
</section>
<section id="sending-a-pull-request">
<h2>4. Sending a Pull Request<a class="headerlink" href="#sending-a-pull-request" title="Link to this heading"></a></h2>
<p>The previous sections finalized the implementation of your cool new policy.
This is no small step and deserves its own victory dance!
Now it is time to share it with everyone in the world by sending a pull request so that your code can be merged to the master branch.</p>
<p>Preparing a pull request typically involves the following steps:</p>
<ol class="loweralpha simple">
<li><p>Add a note about your changes in the CHANGELOG.txt.</p></li>
<li><p>Update the library version. You can use a keyword search for “version” to make sure you cover all version fields.</p></li>
<li><p>Update the README.md, if necessary.</p></li>
<li><p>Update the documentation rst files under the /docsrc folder , if necessary.</p></li>
<li><p>If you update any documentation, make sure to recompile the docs by running <code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">github</span></code> under the /docsrc folder. If you have mabwiser installed and it isn’t in development mode, you will need to uninstall it, as Sphinx first looks for an installed version before using the local package. To install in development mode use pip install -e.</p></li>
</ol>
<p><strong>Congratulations!!</strong> You are ready to send a Pull Request and include your changes in the MABWiser library.
How cool is that? :)</p>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="contributing.html" class="btn btn-neutral float-left" title="Contributing" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="api.html" class="btn btn-neutral float-right" title="MABWiser Public API" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>© Copyright Copyright (C), FMR LLC.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>