fix docstring etc.

hakuhodo-technologies · May 29, 2023 · cc19a42 · cc19a42
1 parent a42b083
commit cc19a42
Show file tree

Hide file tree

Showing 39 changed files with 446 additions and 379 deletions.
diff --git a/FrequentlyAskedQuestions.md b/FrequentlyAskedQuestions.md
@@ -18,17 +18,17 @@ while not done:
     obs, reward, done, info = env.step(action)
 ```
 
-To solve this incompatibility, please use `NewGymAPIWrapper` provided in `ofrl/utils.py`. It should be used as follows.
+To solve this incompatibility, please use `NewGymAPIWrapper` provided in `scope_rl/utils.py`. It should be used as follows.
 ```Python
-from ofrl.utils import NewGymAPIWrapper
+from scope_rl.utils import NewGymAPIWrapper
 env = NewGymAPIWrapper(env)
 ```
 
 Q. xxx environment does not work on d3rlpy, which is used for model training. How should we fix it? (d3rlpy and OFRL is compatible to different version of Open AI Gym.)
 
-A. While OFRL is compatible to the latest API of Open AI Gym, d3rlpy is not. Therefore, please use `OldGymAPIWrapper` provided in `ofrl/utils.py` to make the environment work for d3rlpy.
+A. While OFRL is compatible to the latest API of Open AI Gym, d3rlpy is not. Therefore, please use `OldGymAPIWrapper` provided in `scope_rl/utils.py` to make the environment work for d3rlpy.
 ```Python
-from ofrl.utils import OldGymAPIWrapper
+from scope_rl.utils import OldGymAPIWrapper
 env = gym.make("xxx_v0")  # compatible to gym>=0.26.2 and OFRL
 env_ = OldGymAPIWrapper(env)  # compatible to gym<0.26.2 and d3rlpy
 ```
diff --git a/LICENSE b/LICENSE
@@ -186,7 +186,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright [2022] [Haruka Kiyohara, Yuta Saito, and negocia, Inc.]
+   Copyright [2023] [Haruka Kiyohara, Yuta Saito, and negocia, Inc.]
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
-# OFRL: A pipeline for offline reinforcement learning research and applications
+# SCOPE-RL: A pipeline for offline reinforcement learning research and applications
 <details>
 <summary><strong>Table of Contents </strong>(click to expand)</summary>
 
-- [OFRL: A pipeline for offline reinforcement learning research and applications](#OFRL-a-pipeline-for-offline-reinforcement-learning-research-and-applications)
+- [SCOPE-RL: A pipeline for offline reinforcement learning research and applications](#SCOPE-RL-a-pipeline-for-offline-reinforcement-learning-research-and-applications)
 - [Overview](#overview)
 - [Installation](#installation)
 - [Usage](#usage)
@@ -22,23 +22,23 @@
 
 ## Overview
 
-*OFRL* is an open-source Python Software for implementing the end-to-end procedure regarding **offline Reinforcement Learning (offline RL)**, from data collection to offline policy learning, performance evaluation, and policy selection. Our software includes a series of modules to implement synthetic dataset generation, dataset preprocessing, estimators for Off-Policy Evaluation (OPE), and Off-Policy Selection (OPS) methods.
+*SCOPE-RL* is an open-source Python Software for implementing the end-to-end procedure regarding **offline Reinforcement Learning (offline RL)**, from data collection to offline policy learning, performance evaluation, and policy selection. Our software includes a series of modules to implement synthetic dataset generation, dataset preprocessing, estimators for Off-Policy Evaluation (OPE), and Off-Policy Selection (OPS) methods.
 
-This software is also compatible with [d3rlpy](https://github.com/takuseno/d3rlpy), which implements a range of online and offline RL methods. OFRL enables an easy, transparent, and reliable experiment in offline RL research on any environment with [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface and also facilitates implementation of offline RL in practice on a variety of customized datasets.
+This software is also compatible with [d3rlpy](https://github.com/takuseno/d3rlpy), which implements a range of online and offline RL methods. SCOPE-RL enables an easy, transparent, and reliable experiment in offline RL research on any environment with [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface and also facilitates implementation of offline RL in practice on a variety of customized datasets.
 
 Our software enables evaluation and algorithm comparison related to the following research topics:
 
-- **Offline Reinforcement Learning**: Offline RL aims at learning a new policy from only offline logged data collected by a behavior policy. OFRL enables flexible experiment using customized dataset collected by various behavior policies and on a variety of environment.
+- **Offline Reinforcement Learning**: Offline RL aims at learning a new policy from only offline logged data collected by a behavior policy. SCOPE-RL enables flexible experiment using customized dataset collected by various behavior policies and on a variety of environment.
 
-- **Off-Policy Evaluation**: OPE aims at evaluating the performance of a counterfactual policy using only offline logged data. OFRL supports many OPE estimators and streamlines the experimental procedure to evaluate OPE estimators. Moreover, we also implement advanced OPE, such as cumulative distribution estimation.
+- **Off-Policy Evaluation**: OPE aims at evaluating the performance of a counterfactual policy using only offline logged data. SCOPE-RL supports many OPE estimators and streamlines the experimental procedure to evaluate OPE estimators. Moreover, we also implement advanced OPE, such as cumulative distribution estimation.
 
-- **Off-Policy Selection**: OPS aims at identifying the best-performing policy from a pool of several candidate policies using offline logged data. OFRL supports some basic OPS methods and provides some metrics to evaluate the OPS accuracy.
+- **Off-Policy Selection**: OPS aims at identifying the best-performing policy from a pool of several candidate policies using offline logged data. SCOPE-RL supports some basic OPS methods and provides some metrics to evaluate the OPS accuracy.
 
 This software is intended for the episodic RL setup. For those interested in the contextual bandit setup, we'd recommend [Open Bandit Pipeline](https://github.com/st-tech/zr-obp).
 
 ### Implementations
 
-*OFRL* mainly consists of the following three modules.
+*SCOPE-RL* mainly consists of the following three modules.
 - [**dataset module**](./_gym/dataset): This module provides tools to generate synthetic data from any environment on top of [OpenAI Gym](http://gym.openai.com/) and [Gymnasium](https://gymnasium.farama.org/)-like interface. It also provides tools to preprocess the logged data.
 - [**policy module**](./_gym/policy): This module provides a wrapper class for [d3rlpy](https://github.com/takuseno/d3rlpy) to enable a flexible data collection.
 - [**ope module**](./_gym/ope): This module provides a generic abstract class to implement an OPE estimator and some popular estimators. It also provides some tools useful for performing OPS.
@@ -115,19 +115,19 @@ To provide an example of performing a customized experiment imitating a practica
 
 ## Installation
 
-You can install OFRL using Python's package manager `pip`.
+You can install SCOPE-RL using Python's package manager `pip`.
 ```
-pip install ofrl
+pip install scope-rl
 ```
 
-You can also install OFRL from source.
+You can also install SCOPE-RL from source.
 ```bash
-git clone https://github.com/negocia-inc/ofrl
-cd ofrl
+git clone https://github.com/negocia-inc/scope-rl
+cd scope-rl
 python setup.py install
 ```
 
-OFRL supports Python 3.7 or newer. See [pyproject.toml](./pyproject.toml) for other requirements.
+SCOPE-RL supports Python 3.7 or newer. See [pyproject.toml](./pyproject.toml) for other requirements.
 
 ## Usage
 
@@ -140,9 +140,9 @@ Let's start by collecting some logged data useful for offline RL.
 ```Python
 # implement a data collection procedure on the RTBGym environment
 
-# import OFRL modules
-from ofrl.dataset import SyntheticDataset
-from ofrl.policy import DiscreteEpsilonGreedyHead
+# import SCOPE-RL modules
+from scope_rl.dataset import SyntheticDataset
+from scope_rl.policy import DiscreteEpsilonGreedyHead
 # import d3rlpy algorithms
 from d3rlpy.algos import DoubleDQN
 from d3rlpy.online.buffers import ReplayBuffer
@@ -201,7 +201,7 @@ test_logged_dataset = dataset.obtain_trajectories(
 We are now ready to learn a new policy from the logged data using [d3rlpy](https://github.com/takuseno/d3rlpy).
 
 ```Python
-# implement an offline RL procedure using OFRL and d3rlpy
+# implement an offline RL procedure using SCOPE-RL and d3rlpy
 
 # import d3rlpy algorithms
 from d3rlpy.dataset import MDPDataset
@@ -232,15 +232,15 @@ cql.fit(
 Then, we evaluate the performance of the learned policy using offline logged data. Specifically, we compare the estimation results of various OPE estimators, including Direct Method (DM), Trajectory-wise Importance Sampling (TIS), Per-Decision Importance Sampling (PDIS), and Doubly Robust (DR).
 
 ```Python
-# implement a basic OPE procedure using OFRL
+# implement a basic OPE procedure using SCOPE-RL
 
-# import OFRL modules
-from ofrl.ope import CreateOPEInput
-from ofrl.ope import DiscreteOffPolicyEvaluation as OPE
-from ofrl.ope import DiscreteDirectMethod as DM
-from ofrl.ope import DiscreteTrajectoryWiseImportanceSampling as TIS
-from ofrl.ope import DiscretePerDecisionImportanceSampling as PDIS
-from ofrl.ope import DiscreteDoublyRobust as DR
+# import SCOPE-RL modules
+from scope_rl.ope import CreateOPEInput
+from scope_rl.ope import DiscreteOffPolicyEvaluation as OPE
+from scope_rl.ope import DiscreteDirectMethod as DM
+from scope_rl.ope import DiscreteTrajectoryWiseImportanceSampling as TIS
+from scope_rl.ope import DiscretePerDecisionImportanceSampling as PDIS
+from scope_rl.ope import DiscreteDoublyRobust as DR
 
 # (4) Evaluate the learned policy in an offline manner
 # we compare ddqn, cql, and random policy
@@ -303,15 +303,15 @@ A formal quickstart example with RTBGym is available at [quickstart/rtb_syntheti
 We can also estimate various performance statics including variance and conditional value at risk (CVaR) by using estimators of cumulative distribution function.
 
 ```Python
-# implement a cumulative distribution estimation procedure using OFRL
+# implement a cumulative distribution estimation procedure using SCOPE-RL
 
-# import OFRL modules
-from ofrl.ope import DiscreteCumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE
-from ofrl.ope import DiscreteCumulativeDistributionDirectMethod as CD_DM
-from ofrl.ope import DiscreteCumulativeDistributionTrajectoryWiseImportanceSampling as CD_IS
-from ofrl.ope import DiscreteCumulativeDistributionTrajectoryWiseDoublyRobust as CD_DR
-from ofrl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseImportanceSampling as CD_SNIS
-from ofrl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseDoublyRobust as CD_SNDR
+# import SCOPE-RL modules
+from scope_rl.ope import DiscreteCumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE
+from scope_rl.ope import DiscreteCumulativeDistributionDirectMethod as CD_DM
+from scope_rl.ope import DiscreteCumulativeDistributionTrajectoryWiseImportanceSampling as CD_IS
+from scope_rl.ope import DiscreteCumulativeDistributionTrajectoryWiseDoublyRobust as CD_DR
+from scope_rl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseImportanceSampling as CD_SNIS
+from scope_rl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseDoublyRobust as CD_SNDR
 
 # (4) Evaluate the cumulative distribution function of the learned policy (in an offline manner)
 # we compare ddqn, cql, and random policy defined from the previous section (i.e., (3) of basic OPE procedure)
@@ -349,8 +349,8 @@ Finally, we select the best-performing policy based on the OPE results using the
 ```Python
 # perform off-policy selection based on the OPE results
 
-# import OFRL modules
-from ofrl.ope import OffPolicySelection
+# import SCOPE-RL modules
+from scope_rl.ope import OffPolicySelection
 
 # (5) Conduct Off-Policy Selection
 # Initialize the OPS class
@@ -407,15 +407,22 @@ For more examples, please refer to [quickstart/rtb_synthetic_discrete_advanced.i
 If you use our software in your work, please cite our paper:
 
 Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
-**Title**<br>
-[link]()
+**SCOPE-RL: Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning**<br>
+[link]() (a preprint coming soon..)
 
 Bibtex:
 ```
+@article{kiyohara2023scope,
+  author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
+  title = {SCOPE-RL: Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning},
+  journal = {A github repository},
+  pages = {xxx--xxx},
+  year = {2023},
+}
 ```
 
 ## Contribution
-Any contributions to OFRL are more than welcome!
+Any contributions to SCOPE-RL are more than welcome!
 Please refer to [CONTRIBUTING.md](./CONTRIBUTING.md) for general guidelines how to contribute the project.
 
 ## License
@@ -424,7 +431,7 @@ This project is licensed under Apache 2.0 license - see [LICENSE](LICENSE) file
 
 ## Project Team
 
-- [Haruka Kiyohara](https://sites.google.com/view/harukakiyohara) (**Main Contributor**; Tokyo Institute of Technology)
+- [Haruka Kiyohara](https://sites.google.com/view/harukakiyohara) (**Main Contributor**)
 - Ren Kishimoto (Tokyo Institute of Technology)
 - Kosuke Kawakami (negocia, Inc.)
 - Ken Kobayashi (Tokyo Institute of Technology)
@@ -433,7 +440,7 @@ This project is licensed under Apache 2.0 license - see [LICENSE](LICENSE) file
 
 ## Contact
 
-For any question about the paper and software, feel free to contact: [email protected]
+For any question about the paper and software, feel free to contact: hk844 [at] cornell.edu
 
 ## References
 

diff --git a/basicgym/README.md b/basicgym/README.md
@@ -19,7 +19,7 @@
 
 *BasicGym* is an open-source simulation platform for synthetic simulation, which is written in Python. The simulator is particularly intended for reinforcement learning algorithms and follows [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. We design SyntheticGym as a configurative environment so that researchers and practitioner can customize the environmental modules including `StateTransitionFunction` and `RewardFunction`
 
-Note that, SyntheticGym is publicized under [ofrl](../) repository, which facilitates the implementation of offline reinforcement learning procedure.
+Note that, SyntheticGym is publicized under [scope-rl](../) repository, which facilitates the implementation of offline reinforcement learning procedure.
 
 ### Basic Setting
 
@@ -47,22 +47,22 @@ SyntheticGym is configurative about the following a module.
 Note that, users can customize the above modules by following the [abstract class](./envs/simulator/base.py).
 
 ## Installation
-SyntheticGym can be installed as a part of [ofrl](../) using Python's package manager `pip`.
+SyntheticGym can be installed as a part of [scope-rl](../) using Python's package manager `pip`.
 ```
-pip install ofrl
+pip install scope-rl
 ```
 
 You can also install from source.
 ```bash
-git clone https://github.com/negocia-inc/ofrl
-cd ofrl
+git clone https://github.com/negocia-inc/scope-rl
+cd scope-rl
 python setup.py install
 ```
 
 ## Usage
 
 We provide an example usage of the standard and customized environment. \
-The online/offlline RL and Off-Policy Evaluation examples are provides in [OFRL's README](../README.md).
+The online/offlline RL and Off-Policy Evaluation examples are provides in [SCOPE-RL's README](../README.md).
 
 ### Standard SyntheticEnv
 
@@ -90,7 +90,7 @@ Let's visualize the case with uniform random policy .
 
 ```Python
 # import from other libraries
-from ofrl.policy import OnlineHead
+from scope_rl.policy import OnlineHead
 from d3rlpy.algos import RandomPolicy as ContinuousRandomPolicy
 
 # define a random agent
@@ -134,7 +134,7 @@ plt.show()
 </p>
 </figcaption>
 
-Note that, while we use [ofrl](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, SyntheticGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface.
+Note that, while we use [SCOPE-RL](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, SyntheticGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface.
 
 ### Customized SyntheticEnv
 

diff --git a/basicgym/envs/synthetic.py b/basicgym/envs/synthetic.py
@@ -85,8 +85,8 @@ class BasicEnv(gym.Env):
 
         # import necessary module from syntheticgym
         from syntheticgym import SyntheticEnv
-        from ofrl.policy import OnlineHead
-        from ofrl.ope.online import calc_on_policy_policy_value
+        from scope_rl.policy import OnlineHead
+        from scope_rl.ope.online import calc_on_policy_policy_value
 
         # import necessary module from other libraries
         from d3rlpy.algos import RandomPolicy

diff --git a/basicgym/types.py b/basicgym/types.py
@@ -3,4 +3,4 @@
 import numpy as np
 
 
-Action = Union[int, float, np.integer, np.float, np.float32, np.ndarray]
+Action = Union[int, float, np.integer, np.float64, np.float32, np.ndarray]
diff --git a/docs/README.md b/docs/README.md
@@ -1,4 +1,4 @@
-OFRL docstring
+SCOPE-RL docstring
 ========
 
 ### Prerequisite
Original file line number	Diff line number	Diff line change
Expand Up		@@ -3,4 +3,4 @@
		import numpy as np


		Action = Union[int, float, np.integer, np.float, np.float32, np.ndarray]
		Action = Union[int, float, np.integer, np.float64, np.float32, np.ndarray]