Skip to content

Commit

Permalink
Copy of UCBVI from rlberry_research to rlberry_scool and misc changes…
Browse files Browse the repository at this point in the history
… to doc (#451)

* move UCBVI from rlberry_research to rlberry_scool
* update script to test markdown
* add toggleable menu
* update user guide to NOT use rlberry_research
* remove use of IPython
* update tuto deepRL
* update tuto deepRL
* update contributing guideline (agent are not in rlberry_main anymore)
* add doc to monthly test

---------

Co-authored-by: Timothee Mathieu <[email protected]>
  • Loading branch information
JulienT01 and TimotheeMathieu authored Apr 24, 2024
1 parent 730b92b commit 6f933f5
Show file tree
Hide file tree
Showing 42 changed files with 738 additions and 819 deletions.
Binary file not shown.
452 changes: 66 additions & 386 deletions docs/basics/DeepRLTutorial/TutorialDeepRL.md

Large diffs are not rendered by default.

Binary file modified docs/basics/DeepRLTutorial/output_10_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/basics/DeepRLTutorial/output_5_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/basics/DeepRLTutorial/output_6_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/basics/DeepRLTutorial/output_9_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 21 additions & 15 deletions docs/basics/comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ We compute the performances of one agent as follows:
```python
import numpy as np
from rlberry.envs import gym_make
from rlberry.agents.torch import A2CAgent
from rlberry.agents.stable_baselines import StableBaselinesAgent
from stable_baselines3 import A2C
from rlberry.manager import AgentManager, evaluate_agents

env_ctor = gym_make
Expand All @@ -58,8 +59,9 @@ n_simulations = 50
n_fit = 8

rbagent = AgentManager(
A2CAgent,
StableBaselinesAgent,
(env_ctor, env_kwargs),
init_kwargs=dict(algo_cls=A2C), # Init value for StableBaselinesAgent
agent_name="A2CAgent",
fit_budget=3e4,
eval_kwargs=dict(eval_horizon=500),
Expand All @@ -78,32 +80,36 @@ The evaluation and statistical hypothesis testing is handled through the functio

For example we may compare PPO, A2C and DQNAgent on Cartpole with the following code.

``` python
from rlberry.agents.torch import A2CAgent, PPOAgent, DQNAgent
```python
from rlberry.agents.stable_baselines import StableBaselinesAgent
from stable_baselines3 import A2C, PPO, DQN
from rlberry.manager.comparison import compare_agents

agents = [
AgentManager(
A2CAgent,
StableBaselinesAgent,
(env_ctor, env_kwargs),
init_kwargs=dict(algo_cls=A2C), # Init value for StableBaselinesAgent
agent_name="A2CAgent",
fit_budget=3e4,
fit_budget=1e5,
eval_kwargs=dict(eval_horizon=500),
n_fit=n_fit,
),
AgentManager(
PPOAgent,
StableBaselinesAgent,
(env_ctor, env_kwargs),
init_kwargs=dict(algo_cls=PPO), # Init value for StableBaselinesAgent
agent_name="PPOAgent",
fit_budget=3e4,
fit_budget=1e5,
eval_kwargs=dict(eval_horizon=500),
n_fit=n_fit,
),
AgentManager(
DQNAgent,
StableBaselinesAgent,
(env_ctor, env_kwargs),
init_kwargs=dict(algo_cls=DQN), # Init value for StableBaselinesAgent
agent_name="DQNAgent",
fit_budget=3e4,
fit_budget=1e5,
eval_kwargs=dict(eval_horizon=500),
n_fit=n_fit,
),
Expand All @@ -116,12 +122,12 @@ print(compare_agents(agents))
```

```
Agent1 vs Agent2 mean Agent1 mean Agent2 mean diff std diff decisions p-val significance
0 A2CAgent vs PPOAgent 213.600875 423.431500 -209.830625 144.600160 reject 0.002048 **
1 A2CAgent vs DQNAgent 213.600875 443.296625 -229.695750 152.368506 reject 0.000849 ***
2 PPOAgent vs DQNAgent 423.431500 443.296625 -19.865125 104.279024 accept 0.926234
Agent1 vs Agent2 mean Agent1 mean Agent2 mean diff std diff decisions p-val significance
0 A2CAgent vs PPOAgent 416.9975 500.00000 -83.00250 147.338488 accept 0.266444
1 A2CAgent vs DQNAgent 416.9975 260.38375 156.61375 179.503659 reject 0.017001 *
2 PPOAgent vs DQNAgent 500.0000 260.38375 239.61625 80.271521 reject 0.000410 ***
```

The results of `compare_agents(agents)` show the p-values and significance level if the method is `tukey_hsd` and in all the cases it shows the decision accept or reject of the test with Family-wise error controlled by $0.05$. In our case, we see that A2C seems significantly worst than both PPO and DQN but the difference between PPO and DQN is not statistically significant. Remark that no significance (which is to say, decision to accept $H_0$) does not necessarily mean that the algorithms perform the same, it can be that there is not enough data.
The results of `compare_agents(agents)` show the p-values and significance level if the method is tukey_hsd and it shows the decision accept or reject of the test with Family-wise error controlled by $0.05$. In our case, we see that DQN is worse than A2C and PPO but the difference between PPO and A2C is not statistically significant. Remark that no significance (which is to say, decision to accept $H_0$) does not necessarily mean that the algorithms perform the same, it can be that there is not enough data (and it is likely that it is the case here).

*Remark*: the comparison we do here is a black-box comparison in the sense that we don't care how the algorithms were tuned or how many training steps are used, we suppose that the user already tuned these parameters adequately for a fair comparison.
11 changes: 4 additions & 7 deletions docs/basics/quick_start_rl/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,15 @@ import numpy as np
import pandas as pd
import time
from rlberry.agents import AgentWithSimplePolicy
from rlberry_research.agents import UCBVIAgent
from rlberry_research.envs import Chain
from rlberry_scool.agents import UCBVIAgent
from rlberry_scool.envs import Chain
from rlberry.manager import (
ExperimentManager,
evaluate_agents,
plot_writer_data,
read_writer_data,
)
from rlberry.wrappers import WriterWrapper
from IPython.display import Image
```

Choosing an RL environment
Expand Down Expand Up @@ -59,8 +58,6 @@ env.save_gif("gif_chain.gif")
# clear rendering data
env.clear_render_buffer()
env.disable_rendering()
# view result
Image(open("gif_chain.gif", "rb").read())
```


Expand All @@ -76,7 +73,7 @@ Defining an agent and a baseline
--------------------------------

We will compare a RandomAgent (which select random action) to the
UCBVIAgent(from [rlberry_research](https://github.com/rlberry-py/rlberry-research)), which is an algorithm that is designed to perform an
UCBVIAgent(from [rlberry_scool](https://github.com/rlberry-py/rlberry-scool)), which is an algorithm that is designed to perform an
efficient exploration. Our goal is then to assess the performance of the
two algorithms.

Expand Down Expand Up @@ -288,7 +285,7 @@ iteration, the environment takes 100 steps (`horizon`) times the



Finally, we plot the reward: Here you can see the mean value over the 10 fited agent, with 2 options (raw and smoothed). Note that, to be able to see the smoothed version, you must have installed the extra package `scikit-fda`, (For more information, you can check the options on the [install page](../../installation.md#options)).
Finally, we plot the reward. Here you can see the mean value over the 10 fitted agent, with 2 options (raw and smoothed). Note that, to be able to see the smoothed version, you must have installed the extra package `scikit-fda`, (For more information, you can check the options on the [install page](../../installation.md#options)).

```python
# Plot of the reward.
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/userguide/adastop.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
(adastop_userguide)=


# AdaStop
# Adaptive hypothesis testing for comparison of RL agents with AdaStop



Expand Down
6 changes: 3 additions & 3 deletions docs/basics/userguide/agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ In rlberry, you can use existing Agent, or create your own custom Agent. You can

## Use rlberry Agent
An agent needs an environment to train. We'll use the same environment as in the [environment](environment_page) section of the user guide.
("Chain" environment from "[rlberry-research](https://github.com/rlberry-py/rlberry-research)")
("Chain" environment from "[rlberry-scool](https://github.com/rlberry-py/rlberry-scool)")

### without agent
```python
from rlberry_research.envs.finite import Chain
from rlberry_scool.envs.finite import Chain

env = Chain(10, 0.1)
env.enable_rendering()
Expand All @@ -37,7 +37,7 @@ With the same environment, we will use an Agent to choose the actions instead of
For this example, you can use "ValueIterationAgent" Agent from "[rlberry-scool](https://github.com/rlberry-py/rlberry-scool)"

```python
from rlberry_research.envs.finite import Chain
from rlberry_scool.envs.finite import Chain
from rlberry_scool.agents.dynprog import ValueIterationAgent

env = Chain(10, 0.1) # same env
Expand Down
4 changes: 2 additions & 2 deletions docs/basics/userguide/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ This is the world with which the agent interacts. The agent can observe this env

## Use rlberry environment
You can find some environments in our other projects "[rlberry-research](https://github.com/rlberry-py/rlberry-research)" and "[rlberry-scool](https://github.com/rlberry-py/rlberry-scool)".
For this example, you can use "Chain" environment from "[rlberry-research](https://github.com/rlberry-py/rlberry-research)"
For this example, you can use "Chain" environment from "[rlberry-scool](https://github.com/rlberry-py/rlberry-scool)"
```python
from rlberry_research.envs.finite import Chain
from rlberry_scool.envs.finite import Chain

env = Chain(10, 0.1)
env.enable_rendering()
Expand Down
Binary file modified docs/basics/userguide/expManager_multieval.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 6f933f5

Please sign in to comment.