Skip to content

Commit

Permalink
Merge pull request #11 from negocia-inc/scope
Browse files Browse the repository at this point in the history
Scope
  • Loading branch information
aiueola authored Jun 1, 2023
2 parents a42b083 + 61a4169 commit d180ace
Show file tree
Hide file tree
Showing 162 changed files with 5,902 additions and 23,478 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ preview/
_autosummary/
_autogallery/
.npmrd
.hydra/
tmp/
debug.ipynb
debug2.ipynb

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
45 changes: 45 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## Contribution Guidelines
First off, thanks for your interest to cotribute to SCOPE-RL!

We are doing our best to make this project even better. However, we recognize that there is ample room for improvement.
We need your help to make this project even better.
Let's make the best Off-Policy Evaluation software for Reinforcement Learning together!

We prepare some conventions as follows:

- [Coding Guidelines](#coding-guidelines)
- [Tests](#tests)
- [Continuous Integration](#continuous-integration)

## Coding Guidelines

Code is formatted with [black](https://github.com/psf/black),
and coding style is checked with [flake8](http://flake8.pycqa.org).

After installing black, you can perform code formatting by the following command:

```bash
# perform formatting recursively for the files under the current dir
$ black .
```

After installing flake8, you can check the coding style by the following command:

```bash
# perform checking of the coding style
$ flake8 .
```

## Tests

We are currently working on implementing unit testing using pytest as the testing framework. We greatly appreciate any helps for adding the test codes. If you are interested in working on the test codes, please contact: [email protected]
<!-- We employ pytest as the testing framework. You can run all the tests as follows: -->

```bash
# perform all the tests under the tests directory
$ pytest .
```

## Continuous Integration

SCOPE-RL uses Github Actions to perform continuous integration.
8 changes: 4 additions & 4 deletions FrequentlyAskedQuestions.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,17 @@ while not done:
obs, reward, done, info = env.step(action)
```

To solve this incompatibility, please use `NewGymAPIWrapper` provided in `ofrl/utils.py`. It should be used as follows.
To solve this incompatibility, please use `NewGymAPIWrapper` provided in `scope_rl/utils.py`. It should be used as follows.
```Python
from ofrl.utils import NewGymAPIWrapper
from scope_rl.utils import NewGymAPIWrapper
env = NewGymAPIWrapper(env)
```

Q. xxx environment does not work on d3rlpy, which is used for model training. How should we fix it? (d3rlpy and OFRL is compatible to different version of Open AI Gym.)

A. While OFRL is compatible to the latest API of Open AI Gym, d3rlpy is not. Therefore, please use `OldGymAPIWrapper` provided in `ofrl/utils.py` to make the environment work for d3rlpy.
A. While OFRL is compatible to the latest API of Open AI Gym, d3rlpy is not. Therefore, please use `OldGymAPIWrapper` provided in `scope_rl/utils.py` to make the environment work for d3rlpy.
```Python
from ofrl.utils import OldGymAPIWrapper
from scope_rl.utils import OldGymAPIWrapper
env = gym.make("xxx_v0") # compatible to gym>=0.26.2 and OFRL
env_ = OldGymAPIWrapper(env) # compatible to gym<0.26.2 and d3rlpy
```
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [2022] [Haruka Kiyohara, Yuta Saito, and negocia, Inc.]
Copyright [2023] [Haruka Kiyohara, Ren Kishimoto, HAKUHODO Technologies Inc., and Hanjuku-kaso Co., Ltd.]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
101 changes: 54 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# OFRL: A pipeline for offline reinforcement learning research and applications
# SCOPE-RL: A Python library for offline reinforcement learning, off-policy evaluation, and selection
<details>
<summary><strong>Table of Contents </strong>(click to expand)</summary>

- [OFRL: A pipeline for offline reinforcement learning research and applications](#OFRL-a-pipeline-for-offline-reinforcement-learning-research-and-applications)
- [SCOPE-RL: A Python library for offline reinforcement learning, off-policy evaluation, and selection](#SCOPE-RL-a-python-library-for-offline-reinforcement-learning-off-policy-evaluation-and-selection)
- [Overview](#overview)
- [Installation](#installation)
- [Usage](#usage)
Expand All @@ -22,23 +22,23 @@

## Overview

*OFRL* is an open-source Python Software for implementing the end-to-end procedure regarding **offline Reinforcement Learning (offline RL)**, from data collection to offline policy learning, performance evaluation, and policy selection. Our software includes a series of modules to implement synthetic dataset generation, dataset preprocessing, estimators for Off-Policy Evaluation (OPE), and Off-Policy Selection (OPS) methods.
*SCOPE-RL* is an open-source Python Software for implementing the end-to-end procedure regarding **offline Reinforcement Learning (offline RL)**, from data collection to offline policy learning, performance evaluation, and policy selection. Our software includes a series of modules to implement synthetic dataset generation, dataset preprocessing, estimators for Off-Policy Evaluation (OPE), and Off-Policy Selection (OPS) methods.

This software is also compatible with [d3rlpy](https://github.com/takuseno/d3rlpy), which implements a range of online and offline RL methods. OFRL enables an easy, transparent, and reliable experiment in offline RL research on any environment with [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface and also facilitates implementation of offline RL in practice on a variety of customized datasets.
This software is also compatible with [d3rlpy](https://github.com/takuseno/d3rlpy), which implements a range of online and offline RL methods. SCOPE-RL enables an easy, transparent, and reliable experiment in offline RL research on any environment with [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface and also facilitates implementation of offline RL in practice on a variety of customized datasets.

Our software enables evaluation and algorithm comparison related to the following research topics:

- **Offline Reinforcement Learning**: Offline RL aims at learning a new policy from only offline logged data collected by a behavior policy. OFRL enables flexible experiment using customized dataset collected by various behavior policies and on a variety of environment.
- **Offline Reinforcement Learning**: Offline RL aims at learning a new policy from only offline logged data collected by a behavior policy. SCOPE-RL enables flexible experiment using customized dataset collected by various behavior policies and on a variety of environment.

- **Off-Policy Evaluation**: OPE aims at evaluating the performance of a counterfactual policy using only offline logged data. OFRL supports many OPE estimators and streamlines the experimental procedure to evaluate OPE estimators. Moreover, we also implement advanced OPE, such as cumulative distribution estimation.
- **Off-Policy Evaluation**: OPE aims at evaluating the performance of a counterfactual policy using only offline logged data. SCOPE-RL supports many OPE estimators and streamlines the experimental procedure to evaluate OPE estimators. Moreover, we also implement advanced OPE, such as cumulative distribution estimation.

- **Off-Policy Selection**: OPS aims at identifying the best-performing policy from a pool of several candidate policies using offline logged data. OFRL supports some basic OPS methods and provides some metrics to evaluate the OPS accuracy.
- **Off-Policy Selection**: OPS aims at identifying the best-performing policy from a pool of several candidate policies using offline logged data. SCOPE-RL supports some basic OPS methods and provides some metrics to evaluate the OPS accuracy.

This software is intended for the episodic RL setup. For those interested in the contextual bandit setup, we'd recommend [Open Bandit Pipeline](https://github.com/st-tech/zr-obp).

### Implementations

*OFRL* mainly consists of the following three modules.
*SCOPE-RL* mainly consists of the following three modules.
- [**dataset module**](./_gym/dataset): This module provides tools to generate synthetic data from any environment on top of [OpenAI Gym](http://gym.openai.com/) and [Gymnasium](https://gymnasium.farama.org/)-like interface. It also provides tools to preprocess the logged data.
- [**policy module**](./_gym/policy): This module provides a wrapper class for [d3rlpy](https://github.com/takuseno/d3rlpy) to enable a flexible data collection.
- [**ope module**](./_gym/ope): This module provides a generic abstract class to implement an OPE estimator and some popular estimators. It also provides some tools useful for performing OPS.
Expand Down Expand Up @@ -115,19 +115,19 @@ To provide an example of performing a customized experiment imitating a practica

## Installation

You can install OFRL using Python's package manager `pip`.
You can install SCOPE-RL using Python's package manager `pip`.
```
pip install ofrl
pip install scope-rl
```

You can also install OFRL from source.
You can also install SCOPE-RL from source.
```bash
git clone https://github.com/negocia-inc/ofrl
cd ofrl
git clone https://github.com/hakuhodo-technologies/scope-rl
cd scope-rl
python setup.py install
```

OFRL supports Python 3.7 or newer. See [pyproject.toml](./pyproject.toml) for other requirements.
SCOPE-RL supports Python 3.7 or newer. See [requirements.txt](./requirements.txt) for other requirements.

## Usage

Expand All @@ -140,9 +140,9 @@ Let's start by collecting some logged data useful for offline RL.
```Python
# implement a data collection procedure on the RTBGym environment

# import OFRL modules
from ofrl.dataset import SyntheticDataset
from ofrl.policy import DiscreteEpsilonGreedyHead
# import SCOPE-RL modules
from scope_rl.dataset import SyntheticDataset
from scope_rl.policy import DiscreteEpsilonGreedyHead
# import d3rlpy algorithms
from d3rlpy.algos import DoubleDQN
from d3rlpy.online.buffers import ReplayBuffer
Expand Down Expand Up @@ -201,7 +201,7 @@ test_logged_dataset = dataset.obtain_trajectories(
We are now ready to learn a new policy from the logged data using [d3rlpy](https://github.com/takuseno/d3rlpy).

```Python
# implement an offline RL procedure using OFRL and d3rlpy
# implement an offline RL procedure using SCOPE-RL and d3rlpy

# import d3rlpy algorithms
from d3rlpy.dataset import MDPDataset
Expand Down Expand Up @@ -232,15 +232,15 @@ cql.fit(
Then, we evaluate the performance of the learned policy using offline logged data. Specifically, we compare the estimation results of various OPE estimators, including Direct Method (DM), Trajectory-wise Importance Sampling (TIS), Per-Decision Importance Sampling (PDIS), and Doubly Robust (DR).

```Python
# implement a basic OPE procedure using OFRL
# implement a basic OPE procedure using SCOPE-RL

# import OFRL modules
from ofrl.ope import CreateOPEInput
from ofrl.ope import DiscreteOffPolicyEvaluation as OPE
from ofrl.ope import DiscreteDirectMethod as DM
from ofrl.ope import DiscreteTrajectoryWiseImportanceSampling as TIS
from ofrl.ope import DiscretePerDecisionImportanceSampling as PDIS
from ofrl.ope import DiscreteDoublyRobust as DR
# import SCOPE-RL modules
from scope_rl.ope import CreateOPEInput
from scope_rl.ope import DiscreteOffPolicyEvaluation as OPE
from scope_rl.ope import DiscreteDirectMethod as DM
from scope_rl.ope import DiscreteTrajectoryWiseImportanceSampling as TIS
from scope_rl.ope import DiscretePerDecisionImportanceSampling as PDIS
from scope_rl.ope import DiscreteDoublyRobust as DR

# (4) Evaluate the learned policy in an offline manner
# we compare ddqn, cql, and random policy
Expand Down Expand Up @@ -303,27 +303,27 @@ A formal quickstart example with RTBGym is available at [quickstart/rtb_syntheti
We can also estimate various performance statics including variance and conditional value at risk (CVaR) by using estimators of cumulative distribution function.

```Python
# implement a cumulative distribution estimation procedure using OFRL
# implement a cumulative distribution estimation procedure using SCOPE-RL

# import OFRL modules
from ofrl.ope import DiscreteCumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE
from ofrl.ope import DiscreteCumulativeDistributionDirectMethod as CD_DM
from ofrl.ope import DiscreteCumulativeDistributionTrajectoryWiseImportanceSampling as CD_IS
from ofrl.ope import DiscreteCumulativeDistributionTrajectoryWiseDoublyRobust as CD_DR
from ofrl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseImportanceSampling as CD_SNIS
from ofrl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseDoublyRobust as CD_SNDR
# import SCOPE-RL modules
from scope_rl.ope import CumulativeDistributionOPE
from scope_rl.ope import DiscreteCumulativeDistributionDM as CD_DM
from scope_rl.ope import DiscreteCumulativeDistributionTIS as CD_IS
from scope_rl.ope import DiscreteCumulativeDistributionTDR as CD_DR
from scope_rl.ope import DiscreteCumulativeDistributionSNTIS as CD_SNIS
from scope_rl.ope import DiscreteCumulativeDistributionSNTDR as CD_SNDR

# (4) Evaluate the cumulative distribution function of the learned policy (in an offline manner)
# we compare ddqn, cql, and random policy defined from the previous section (i.e., (3) of basic OPE procedure)
# initialize the OPE class
cd_ope = CumulativeDistributionOPE(
logged_dataset=test_logged_dataset,
ope_estimators=[
CD_DM(estimator_name="cdf_dm"),
CD_IS(estimator_name="cdf_is"),
CD_DR(estimator_name="cdf_dr"),
CD_SNIS(estimator_name="cdf_snis"),
CD_SNDR(estimator_name="cdf_sndr"),
CD_DM(estimator_name="cd_dm"),
CD_IS(estimator_name="cd_is"),
CD_DR(estimator_name="cd_dr"),
CD_SNIS(estimator_name="cd_snis"),
CD_SNDR(estimator_name="cd_sndr"),
],
)
# estimate the variance
Expand All @@ -349,8 +349,8 @@ Finally, we select the best-performing policy based on the OPE results using the
```Python
# perform off-policy selection based on the OPE results

# import OFRL modules
from ofrl.ope import OffPolicySelection
# import SCOPE-RL modules
from scope_rl.ope import OffPolicySelection

# (5) Conduct Off-Policy Selection
# Initialize the OPS class
Expand Down Expand Up @@ -407,15 +407,22 @@ For more examples, please refer to [quickstart/rtb_synthetic_discrete_advanced.i
If you use our software in your work, please cite our paper:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**Title**<br>
[link]()
**Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation in Reinforcement Learning**<br>
[link]() (a preprint coming soon..)

Bibtex:
```
@article{kiyohara2023towards,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation in Reinforcement Learning},
journal = {A github repository},
pages = {xxx--xxx},
year = {2023},
}
```

## Contribution
Any contributions to OFRL are more than welcome!
Any contributions to SCOPE-RL are more than welcome!
Please refer to [CONTRIBUTING.md](./CONTRIBUTING.md) for general guidelines how to contribute the project.

## License
Expand All @@ -424,16 +431,16 @@ This project is licensed under Apache 2.0 license - see [LICENSE](LICENSE) file

## Project Team

- [Haruka Kiyohara](https://sites.google.com/view/harukakiyohara) (**Main Contributor**; Tokyo Institute of Technology)
- [Haruka Kiyohara](https://sites.google.com/view/harukakiyohara) (**Main Contributor**)
- Ren Kishimoto (Tokyo Institute of Technology)
- Kosuke Kawakami (negocia, Inc.)
- Kosuke Kawakami (HAKUHODO Technologies Inc.)
- Ken Kobayashi (Tokyo Institute of Technology)
- Kazuhide Nakata (Tokyo Institute of Technology)
- [Yuta Saito](https://usait0.com/en/) (Cornell University)

## Contact

For any question about the paper and software, feel free to contact: [email protected]
For any question about the paper and software, feel free to contact: [email protected]

## References

Expand Down
Loading

0 comments on commit d180ace

Please sign in to comment.