diff --git a/FrequentlyAskedQuestions.md b/FrequentlyAskedQuestions.md index 05774a0b..acb5d74b 100644 --- a/FrequentlyAskedQuestions.md +++ b/FrequentlyAskedQuestions.md @@ -18,17 +18,17 @@ while not done: obs, reward, done, info = env.step(action) ``` -To solve this incompatibility, please use `NewGymAPIWrapper` provided in `ofrl/utils.py`. It should be used as follows. +To solve this incompatibility, please use `NewGymAPIWrapper` provided in `scope_rl/utils.py`. It should be used as follows. ```Python -from ofrl.utils import NewGymAPIWrapper +from scope_rl.utils import NewGymAPIWrapper env = NewGymAPIWrapper(env) ``` Q. xxx environment does not work on d3rlpy, which is used for model training. How should we fix it? (d3rlpy and OFRL is compatible to different version of Open AI Gym.) -A. While OFRL is compatible to the latest API of Open AI Gym, d3rlpy is not. Therefore, please use `OldGymAPIWrapper` provided in `ofrl/utils.py` to make the environment work for d3rlpy. +A. While OFRL is compatible to the latest API of Open AI Gym, d3rlpy is not. Therefore, please use `OldGymAPIWrapper` provided in `scope_rl/utils.py` to make the environment work for d3rlpy. ```Python -from ofrl.utils import OldGymAPIWrapper +from scope_rl.utils import OldGymAPIWrapper env = gym.make("xxx_v0") # compatible to gym>=0.26.2 and OFRL env_ = OldGymAPIWrapper(env) # compatible to gym<0.26.2 and d3rlpy ``` \ No newline at end of file diff --git a/LICENSE b/LICENSE index 550bb6f0..7b25e93d 100644 --- a/LICENSE +++ b/LICENSE @@ -186,7 +186,7 @@ same "printed page" as the copyright notice for easier identification within third-party archives. - Copyright [2022] [Haruka Kiyohara, Yuta Saito, and negocia, Inc.] + Copyright [2023] [Haruka Kiyohara, Yuta Saito, and negocia, Inc.] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. diff --git a/README.md b/README.md index e00b77d5..55799d61 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ -# OFRL: A pipeline for offline reinforcement learning research and applications +# SCOPE-RL: A pipeline for offline reinforcement learning research and applications
Table of Contents (click to expand) -- [OFRL: A pipeline for offline reinforcement learning research and applications](#OFRL-a-pipeline-for-offline-reinforcement-learning-research-and-applications) +- [SCOPE-RL: A pipeline for offline reinforcement learning research and applications](#SCOPE-RL-a-pipeline-for-offline-reinforcement-learning-research-and-applications) - [Overview](#overview) - [Installation](#installation) - [Usage](#usage) @@ -22,23 +22,23 @@ ## Overview -*OFRL* is an open-source Python Software for implementing the end-to-end procedure regarding **offline Reinforcement Learning (offline RL)**, from data collection to offline policy learning, performance evaluation, and policy selection. Our software includes a series of modules to implement synthetic dataset generation, dataset preprocessing, estimators for Off-Policy Evaluation (OPE), and Off-Policy Selection (OPS) methods. +*SCOPE-RL* is an open-source Python Software for implementing the end-to-end procedure regarding **offline Reinforcement Learning (offline RL)**, from data collection to offline policy learning, performance evaluation, and policy selection. Our software includes a series of modules to implement synthetic dataset generation, dataset preprocessing, estimators for Off-Policy Evaluation (OPE), and Off-Policy Selection (OPS) methods. -This software is also compatible with [d3rlpy](https://github.com/takuseno/d3rlpy), which implements a range of online and offline RL methods. OFRL enables an easy, transparent, and reliable experiment in offline RL research on any environment with [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface and also facilitates implementation of offline RL in practice on a variety of customized datasets. +This software is also compatible with [d3rlpy](https://github.com/takuseno/d3rlpy), which implements a range of online and offline RL methods. SCOPE-RL enables an easy, transparent, and reliable experiment in offline RL research on any environment with [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface and also facilitates implementation of offline RL in practice on a variety of customized datasets. Our software enables evaluation and algorithm comparison related to the following research topics: -- **Offline Reinforcement Learning**: Offline RL aims at learning a new policy from only offline logged data collected by a behavior policy. OFRL enables flexible experiment using customized dataset collected by various behavior policies and on a variety of environment. +- **Offline Reinforcement Learning**: Offline RL aims at learning a new policy from only offline logged data collected by a behavior policy. SCOPE-RL enables flexible experiment using customized dataset collected by various behavior policies and on a variety of environment. -- **Off-Policy Evaluation**: OPE aims at evaluating the performance of a counterfactual policy using only offline logged data. OFRL supports many OPE estimators and streamlines the experimental procedure to evaluate OPE estimators. Moreover, we also implement advanced OPE, such as cumulative distribution estimation. +- **Off-Policy Evaluation**: OPE aims at evaluating the performance of a counterfactual policy using only offline logged data. SCOPE-RL supports many OPE estimators and streamlines the experimental procedure to evaluate OPE estimators. Moreover, we also implement advanced OPE, such as cumulative distribution estimation. -- **Off-Policy Selection**: OPS aims at identifying the best-performing policy from a pool of several candidate policies using offline logged data. OFRL supports some basic OPS methods and provides some metrics to evaluate the OPS accuracy. +- **Off-Policy Selection**: OPS aims at identifying the best-performing policy from a pool of several candidate policies using offline logged data. SCOPE-RL supports some basic OPS methods and provides some metrics to evaluate the OPS accuracy. This software is intended for the episodic RL setup. For those interested in the contextual bandit setup, we'd recommend [Open Bandit Pipeline](https://github.com/st-tech/zr-obp). ### Implementations -*OFRL* mainly consists of the following three modules. +*SCOPE-RL* mainly consists of the following three modules. - [**dataset module**](./_gym/dataset): This module provides tools to generate synthetic data from any environment on top of [OpenAI Gym](http://gym.openai.com/) and [Gymnasium](https://gymnasium.farama.org/)-like interface. It also provides tools to preprocess the logged data. - [**policy module**](./_gym/policy): This module provides a wrapper class for [d3rlpy](https://github.com/takuseno/d3rlpy) to enable a flexible data collection. - [**ope module**](./_gym/ope): This module provides a generic abstract class to implement an OPE estimator and some popular estimators. It also provides some tools useful for performing OPS. @@ -115,19 +115,19 @@ To provide an example of performing a customized experiment imitating a practica ## Installation -You can install OFRL using Python's package manager `pip`. +You can install SCOPE-RL using Python's package manager `pip`. ``` -pip install ofrl +pip install scope-rl ``` -You can also install OFRL from source. +You can also install SCOPE-RL from source. ```bash -git clone https://github.com/negocia-inc/ofrl -cd ofrl +git clone https://github.com/negocia-inc/scope-rl +cd scope-rl python setup.py install ``` -OFRL supports Python 3.7 or newer. See [pyproject.toml](./pyproject.toml) for other requirements. +SCOPE-RL supports Python 3.7 or newer. See [pyproject.toml](./pyproject.toml) for other requirements. ## Usage @@ -140,9 +140,9 @@ Let's start by collecting some logged data useful for offline RL. ```Python # implement a data collection procedure on the RTBGym environment -# import OFRL modules -from ofrl.dataset import SyntheticDataset -from ofrl.policy import DiscreteEpsilonGreedyHead +# import SCOPE-RL modules +from scope_rl.dataset import SyntheticDataset +from scope_rl.policy import DiscreteEpsilonGreedyHead # import d3rlpy algorithms from d3rlpy.algos import DoubleDQN from d3rlpy.online.buffers import ReplayBuffer @@ -201,7 +201,7 @@ test_logged_dataset = dataset.obtain_trajectories( We are now ready to learn a new policy from the logged data using [d3rlpy](https://github.com/takuseno/d3rlpy). ```Python -# implement an offline RL procedure using OFRL and d3rlpy +# implement an offline RL procedure using SCOPE-RL and d3rlpy # import d3rlpy algorithms from d3rlpy.dataset import MDPDataset @@ -232,15 +232,15 @@ cql.fit( Then, we evaluate the performance of the learned policy using offline logged data. Specifically, we compare the estimation results of various OPE estimators, including Direct Method (DM), Trajectory-wise Importance Sampling (TIS), Per-Decision Importance Sampling (PDIS), and Doubly Robust (DR). ```Python -# implement a basic OPE procedure using OFRL +# implement a basic OPE procedure using SCOPE-RL -# import OFRL modules -from ofrl.ope import CreateOPEInput -from ofrl.ope import DiscreteOffPolicyEvaluation as OPE -from ofrl.ope import DiscreteDirectMethod as DM -from ofrl.ope import DiscreteTrajectoryWiseImportanceSampling as TIS -from ofrl.ope import DiscretePerDecisionImportanceSampling as PDIS -from ofrl.ope import DiscreteDoublyRobust as DR +# import SCOPE-RL modules +from scope_rl.ope import CreateOPEInput +from scope_rl.ope import DiscreteOffPolicyEvaluation as OPE +from scope_rl.ope import DiscreteDirectMethod as DM +from scope_rl.ope import DiscreteTrajectoryWiseImportanceSampling as TIS +from scope_rl.ope import DiscretePerDecisionImportanceSampling as PDIS +from scope_rl.ope import DiscreteDoublyRobust as DR # (4) Evaluate the learned policy in an offline manner # we compare ddqn, cql, and random policy @@ -303,15 +303,15 @@ A formal quickstart example with RTBGym is available at [quickstart/rtb_syntheti We can also estimate various performance statics including variance and conditional value at risk (CVaR) by using estimators of cumulative distribution function. ```Python -# implement a cumulative distribution estimation procedure using OFRL +# implement a cumulative distribution estimation procedure using SCOPE-RL -# import OFRL modules -from ofrl.ope import DiscreteCumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE -from ofrl.ope import DiscreteCumulativeDistributionDirectMethod as CD_DM -from ofrl.ope import DiscreteCumulativeDistributionTrajectoryWiseImportanceSampling as CD_IS -from ofrl.ope import DiscreteCumulativeDistributionTrajectoryWiseDoublyRobust as CD_DR -from ofrl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseImportanceSampling as CD_SNIS -from ofrl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseDoublyRobust as CD_SNDR +# import SCOPE-RL modules +from scope_rl.ope import DiscreteCumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE +from scope_rl.ope import DiscreteCumulativeDistributionDirectMethod as CD_DM +from scope_rl.ope import DiscreteCumulativeDistributionTrajectoryWiseImportanceSampling as CD_IS +from scope_rl.ope import DiscreteCumulativeDistributionTrajectoryWiseDoublyRobust as CD_DR +from scope_rl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseImportanceSampling as CD_SNIS +from scope_rl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseDoublyRobust as CD_SNDR # (4) Evaluate the cumulative distribution function of the learned policy (in an offline manner) # we compare ddqn, cql, and random policy defined from the previous section (i.e., (3) of basic OPE procedure) @@ -349,8 +349,8 @@ Finally, we select the best-performing policy based on the OPE results using the ```Python # perform off-policy selection based on the OPE results -# import OFRL modules -from ofrl.ope import OffPolicySelection +# import SCOPE-RL modules +from scope_rl.ope import OffPolicySelection # (5) Conduct Off-Policy Selection # Initialize the OPS class @@ -407,15 +407,22 @@ For more examples, please refer to [quickstart/rtb_synthetic_discrete_advanced.i If you use our software in your work, please cite our paper: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
-**Title**
-[link]() +**SCOPE-RL: Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning**
+[link]() (a preprint coming soon..) Bibtex: ``` +@article{kiyohara2023scope, + author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, + title = {SCOPE-RL: Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning}, + journal = {A github repository}, + pages = {xxx--xxx}, + year = {2023}, +} ``` ## Contribution -Any contributions to OFRL are more than welcome! +Any contributions to SCOPE-RL are more than welcome! Please refer to [CONTRIBUTING.md](./CONTRIBUTING.md) for general guidelines how to contribute the project. ## License @@ -424,7 +431,7 @@ This project is licensed under Apache 2.0 license - see [LICENSE](LICENSE) file ## Project Team -- [Haruka Kiyohara](https://sites.google.com/view/harukakiyohara) (**Main Contributor**; Tokyo Institute of Technology) +- [Haruka Kiyohara](https://sites.google.com/view/harukakiyohara) (**Main Contributor**) - Ren Kishimoto (Tokyo Institute of Technology) - Kosuke Kawakami (negocia, Inc.) - Ken Kobayashi (Tokyo Institute of Technology) @@ -433,7 +440,7 @@ This project is licensed under Apache 2.0 license - see [LICENSE](LICENSE) file ## Contact -For any question about the paper and software, feel free to contact: kiyohara.h.aa@m.titech.ac.jp +For any question about the paper and software, feel free to contact: hk844 [at] cornell.edu ## References diff --git a/basicgym/README.md b/basicgym/README.md index 7f5e19db..c49c7183 100644 --- a/basicgym/README.md +++ b/basicgym/README.md @@ -19,7 +19,7 @@ *BasicGym* is an open-source simulation platform for synthetic simulation, which is written in Python. The simulator is particularly intended for reinforcement learning algorithms and follows [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. We design SyntheticGym as a configurative environment so that researchers and practitioner can customize the environmental modules including `StateTransitionFunction` and `RewardFunction` -Note that, SyntheticGym is publicized under [ofrl](../) repository, which facilitates the implementation of offline reinforcement learning procedure. +Note that, SyntheticGym is publicized under [scope-rl](../) repository, which facilitates the implementation of offline reinforcement learning procedure. ### Basic Setting @@ -47,22 +47,22 @@ SyntheticGym is configurative about the following a module. Note that, users can customize the above modules by following the [abstract class](./envs/simulator/base.py). ## Installation -SyntheticGym can be installed as a part of [ofrl](../) using Python's package manager `pip`. +SyntheticGym can be installed as a part of [scope-rl](../) using Python's package manager `pip`. ``` -pip install ofrl +pip install scope-rl ``` You can also install from source. ```bash -git clone https://github.com/negocia-inc/ofrl -cd ofrl +git clone https://github.com/negocia-inc/scope-rl +cd scope-rl python setup.py install ``` ## Usage We provide an example usage of the standard and customized environment. \ -The online/offlline RL and Off-Policy Evaluation examples are provides in [OFRL's README](../README.md). +The online/offlline RL and Off-Policy Evaluation examples are provides in [SCOPE-RL's README](../README.md). ### Standard SyntheticEnv @@ -90,7 +90,7 @@ Let's visualize the case with uniform random policy . ```Python # import from other libraries -from ofrl.policy import OnlineHead +from scope_rl.policy import OnlineHead from d3rlpy.algos import RandomPolicy as ContinuousRandomPolicy # define a random agent @@ -134,7 +134,7 @@ plt.show()

-Note that, while we use [ofrl](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, SyntheticGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. +Note that, while we use [SCOPE-RL](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, SyntheticGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. ### Customized SyntheticEnv diff --git a/basicgym/envs/synthetic.py b/basicgym/envs/synthetic.py index 9d2e0aec..da5943b8 100644 --- a/basicgym/envs/synthetic.py +++ b/basicgym/envs/synthetic.py @@ -85,8 +85,8 @@ class BasicEnv(gym.Env): # import necessary module from syntheticgym from syntheticgym import SyntheticEnv - from ofrl.policy import OnlineHead - from ofrl.ope.online import calc_on_policy_policy_value + from scope_rl.policy import OnlineHead + from scope_rl.ope.online import calc_on_policy_policy_value # import necessary module from other libraries from d3rlpy.algos import RandomPolicy diff --git a/basicgym/types.py b/basicgym/types.py index 9e9ee7fa..0a15a6cf 100644 --- a/basicgym/types.py +++ b/basicgym/types.py @@ -3,4 +3,4 @@ import numpy as np -Action = Union[int, float, np.integer, np.float, np.float32, np.ndarray] +Action = Union[int, float, np.integer, np.float64, np.float32, np.ndarray] diff --git a/docs/README.md b/docs/README.md index 612aa0f1..131a40d6 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,4 +1,4 @@ -OFRL docstring +SCOPE-RL docstring ======== ### Prerequisite diff --git a/docs/conf.py b/docs/conf.py index b133cc3f..7c6c55a6 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -19,9 +19,8 @@ # -- Project information ----------------------------------------------------- project = "SCOPE-RL" -copyright = "2023, Haruka Kiyohara, Ren Kishimoto, Yuta Saito, Hakuhodo Technologies" -# copyright = "2023, Haruka Kiyohara, Yuta Saito, and negocia, Inc" -author = "Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Yuta Saito" +copyright = "2023, Haruka Kiyohara, Ren Kishimoto, Hakuhodo Technologies, Hanjuku-kaso Co. Ltd.," +author = "Haruka Kiyohara" # The full version, including alpha/beta/rc tags version = "latest" @@ -72,12 +71,12 @@ # html_theme = "pydata_sphinx_theme" html_theme_options = { - "github_url": "https://github.com/negocia-inc/ofrl", + "github_url": "https://github.com/negocia-inc/scope_rl", # "twitter_url": "https://twitter.com/{account}", "icon_links": [ { "name": "Speaker Deck", - "url": "https://speakerdeck.com/aiueola/ofrl-designing-an-offline-reinforcement-learning-and-policy-evaluation-platform-from-practical-perspectives", + "url": "https://speakerdeck.com/aiueola/scope_rl-designing-an-offline-reinforcement-learning-and-policy-evaluation-platform-from-practical-perspectives", "icon": "fa-brands fa-speaker-deck", "type": "fontawesome", }, @@ -115,19 +114,19 @@ # mapping between class methods and its abbreviation numpydoc_member_order = "bysource" numpydoc_show_inherited_class_members = { - "ofrl.policy.head.BaseHead": False, - "ofrl.policy.head.ContinuousEvalHead": False, - "ofrl.policy.head.ContinuousGaussianHead": False, - "ofrl.policy.head.ContinuousTruncatedGaussianHead": False, - "ofrl.policy.head.DiscreteEpsilonGreedyHead": False, - "ofrl.policy.head.DiscreteSoftmaxHead": False, - "ofrl.policy.head.OnlineHead": False, - "ofrl.ope.weight_value_learning.function.VFunction": False, - "ofrl.ope.weight_value_learning.function.StateWeightFunction": False, - "ofrl.ope.weight_value_learning.function.DiscreteQFunction": False, - "ofrl.ope.weight_value_learning.function.ContinuousQFunction": False, - "ofrl.ope.weight_value_learning.function.DiscreteStateActionWeightFunction": False, - "ofrl.ope.weight_value_learning.function.ContinuousStateActionWeightFunction": False, + "scope_rl.policy.head.BaseHead": False, + "scope_rl.policy.head.ContinuousEvalHead": False, + "scope_rl.policy.head.ContinuousGaussianHead": False, + "scope_rl.policy.head.ContinuousTruncatedGaussianHead": False, + "scope_rl.policy.head.DiscreteEpsilonGreedyHead": False, + "scope_rl.policy.head.DiscreteSoftmaxHead": False, + "scope_rl.policy.head.OnlineHead": False, + "scope_rl.ope.weight_value_learning.function.VFunction": False, + "scope_rl.ope.weight_value_learning.function.StateWeightFunction": False, + "scope_rl.ope.weight_value_learning.function.DiscreteQFunction": False, + "scope_rl.ope.weight_value_learning.function.ContinuousQFunction": False, + "scope_rl.ope.weight_value_learning.function.DiscreteStateActionWeightFunction": False, + "scope_rl.ope.weight_value_learning.function.ContinuousStateActionWeightFunction": False, } numpydoc_xref_aliases = { # 'LeaveOneOut': 'sklearn.model_selection.LeaveOneOut', @@ -153,7 +152,7 @@ "tutorial/basic_ope", "tutorial/cumulative_distribution_ope", "tutorial/ops", - "tutorial/ofrl_others", + "tutorial/scope_rl_others", "tutorial/multiple_datasets", "tutorial/rtbgym", "tutorial/footer", diff --git a/docs/documentation/distinctive_features.rst b/docs/documentation/distinctive_features.rst index 0f0ae0fd..d3192ac1 100644 --- a/docs/documentation/distinctive_features.rst +++ b/docs/documentation/distinctive_features.rst @@ -1,7 +1,7 @@ :html_theme.sidebar_secondary.remove: ========== -Why OFRL? +Why SCOPE-RL? ========== Motivation @@ -348,7 +348,7 @@ In summary, **our unique contribution is (1) to provide the first end-to-end platform for offline RL, OPE, and OPS, (2) to support cumulative distribution ope for the first time, and (3) to implement (the proposed) top-** :math:`k` **risk-return tradeoff metics for the risk assessments of OPS.** -Additionally, we provide a user-friendly :doc:`visualization tools `, :doc:`documentation `, and `quickstart examples `_ to facilitate a quick benckmarking and practical application. +Additionally, we provide a user-friendly :doc:`visualization tools `, :doc:`documentation `, and `quickstart examples `_ to facilitate a quick benckmarking and practical application. We also provide an :doc:`OPE tutorial <_autogallery/index>` with SCOPE-RL experiments for educational purpose. We hope that SCOPE-RL will serve as a important milestone for the future development of OPE research. diff --git a/docs/documentation/evaluation_implementation.rst b/docs/documentation/evaluation_implementation.rst index 60faedf0..f895f9dc 100644 --- a/docs/documentation/evaluation_implementation.rst +++ b/docs/documentation/evaluation_implementation.rst @@ -11,7 +11,7 @@ Before proceeding to OPE/OPS, we first create :class:`input_dict` to enable a sm .. code-block:: python # create input for OPE class - from ofrl.ope import CreateOPEInput + from scope_rl.ope import CreateOPEInput prep = CreateOPEInput( env=env, ) @@ -61,8 +61,8 @@ Before proceeding to OPE/OPS, we first create :class:`input_dict` to enable a sm * :ref:`How to obtain MultipleLoggedDataset? ` * :ref:`How to handle OPL with MultipleLoggedDataset? ` - * :doc:`API reference of MultipleInputDict <_autosummary/ofrl.utils.MultipleInputDict>` - * :ref:`Tutorial with MultipleLoggedDataset ` + * :doc:`API reference of MultipleInputDict <_autosummary/scope_rl.utils.MultipleInputDict>` + * :ref:`Tutorial with MultipleLoggedDataset ` .. dropdown:: How to select models for value/weight learning methods? @@ -131,8 +131,8 @@ Before proceeding to OPE/OPS, we first create :class:`input_dict` to enable a sm .. seealso:: - * :doc:`API reference of CreateInputDict <_autosummary/ofrl.ope.input>` - * :ref:`API reference of value/weight learning methods ` + * :doc:`API reference of CreateInputDict <_autosummary/scope_rl.ope.input>` + * :ref:`API reference of value/weight learning methods ` * :ref:`Logics behind value and weight learning methods (How to obtain state(-action) marginal importance weight?) ` .. dropdown:: How to collect input_dict in a non-episodic setting? @@ -175,7 +175,7 @@ We begin with the :class:`OffPolicyEvaluation` class to streamline the OPE proce .. code-block:: python # initialize the OPE class - from ofrl.ope import OffPolicyEvaluation as OPE + from scope_rl.ope import OffPolicyEvaluation as OPE ope = OPE( logged_dataset=logged_dataset, ope_estimators=[DM(), TIS(), PDIS(), DR()], @@ -280,7 +280,7 @@ Using the OPE class, we can obtain the OPE results of various estimators at once * :ref:`How to obtain MultipleLoggedDataset? ` * :ref:`How to handle OPL with MultipleLoggedDataset? ` * :ref:`How to create input_dict for MultipleLoggedDataset? ` - * :ref:`Tutorial with MultipleLoggedDataset ` + * :ref:`Tutorial with MultipleLoggedDataset ` .. seealso:: @@ -377,11 +377,11 @@ Extensions If you want to add other arguments, please add them in the initialization arguments for API consistency. - Finally, contribution to OFRL with a new OPE estimator is more than welcome! Please read `the guidelines for contribution (CONTRIBUTING.md) <>`_. + Finally, contribution to SCOPE-RL with a new OPE estimator is more than welcome! Please read `the guidelines for contribution (CONTRIBUTING.md) <>`_. .. seealso:: - :doc:`API reference of BaseOffPolicyEstimator <_autosummary/ofrl.ope.estimators_base>` explains the abstract methods. + :doc:`API reference of BaseOffPolicyEstimator <_autosummary/scope_rl.ope.estimators_base>` explains the abstract methods. .. _implementation_dm: @@ -544,8 +544,8 @@ This estimator is particularly useful when policy visits the same or similar sta .. seealso:: * :ref:`How to select models for value/weight learning methods? ` describes how to enable weight learning and select weight learning methods. - * :ref:`API reference of value/weight learning methods ` - * :doc:`API reference of CreateInputDict <_autosummary/ofrl.ope.input>` + * :ref:`API reference of value/weight learning methods ` + * :doc:`API reference of CreateInputDict <_autosummary/scope_rl.ope.input>` We implement state marginal and state-action marginal OPE estimators in the following classes (both for :class:`Discrete-` and :class:`Continuous-` action spaces). @@ -816,7 +816,7 @@ To estimate both CDF and various risk functions, we provide the following :class .. code-block:: python # initialize the OPE class - from ofrl.ope import CumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE + from scope_rl.ope import CumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE cd_ope = CumulativeDistributionOPE( logged_dataset=logged_dataset, ope_estimators=[CD_DM(), CD_IS(), CD_DR()], @@ -940,7 +940,7 @@ It estimates the cumulative distribution of the trajectory wise reward and vario * :ref:`How to obtain MultipleLoggedDataset? ` * :ref:`How to handle OPL with MultipleLoggedDataset? ` * :ref:`How to create input_dict for MultipleLoggedDataset? ` - * :ref:`Tutorial with MultipleLoggedDataset ` + * :ref:`Tutorial with MultipleLoggedDataset ` .. seealso:: @@ -1029,11 +1029,11 @@ Extension to the continuous action space If you want to add other arguments, please add them in the initialization arguments for API consistency. - Finally, contribution to OFRL with a new OPE estimator is more than welcome! Please read `the guidelines for contribution (CONTRIBUTING.md) <>`_. + Finally, contribution to SCOPE-RL with a new OPE estimator is more than welcome! Please read `the guidelines for contribution (CONTRIBUTING.md) <>`_. .. seealso:: - :doc:`API reference of BaseOffPolicyEstimator <_autosummary/ofrl.ope.estimators_base>` explains the abstract methods. + :doc:`API reference of BaseOffPolicyEstimator <_autosummary/scope_rl.ope.estimators_base>` explains the abstract methods. .. _implementation_cd_dm: @@ -1127,7 +1127,7 @@ To ease the comparison of candidate (evaluation) policies and the OPE estimators .. code-block:: python # Initialize the OPS class - from ofrl.ope import OffPolicySelection + from scope_rl.ope import OffPolicySelection ops = OffPolicySelection( ope=ope, cumulative_distribution_ope=cd_ope, @@ -1226,7 +1226,7 @@ Finally, the OPS class also implements the modules to compare the OPE result and * :ref:`How to create input_dict for MultipleLoggedDataset? ` * :ref:`How to conduct OPE with MultipleLoggedDataset? ` * :ref:`How to conduct Cumulative Distribution OPE with MultipleLoggedDataset? ` - * :ref:`Tutorial with MultipleLoggedDataset ` + * :ref:`Tutorial with MultipleLoggedDataset ` .. seealso:: @@ -1339,7 +1339,7 @@ The OPS class implements the following functions. **Visualization tools** .. grid-item-card:: - :link: ofrl_api + :link: scope_rl_api :link-type: doc :shadow: none :margin: 0 diff --git a/docs/documentation/frequently_asked_questions.rst b/docs/documentation/frequently_asked_questions.rst index 377a3edc..34b7eecf 100644 --- a/docs/documentation/frequently_asked_questions.rst +++ b/docs/documentation/frequently_asked_questions.rst @@ -2,12 +2,12 @@ FAQs ========== -OFRL +SCOPE-RL ~~~~~~~~~~ -.. rubric:: Q. xxx environment does not work on OFRL. How should we fix it? +.. rubric:: Q. xxx environment does not work on SCOPE-RL. How should we fix it? -A. OFRL is compatible to Open AI Gym and Gymnasium API, specifically for `gym>=0.26.0`, which works as follows. +A. SCOPE-RL is compatible to Open AI Gym and Gymnasium API, specifically for `gym>=0.26.0`, which works as follows. .. code-block:: Python @@ -25,21 +25,21 @@ In contrast, your environment may use the following older interface. action = agent.act(obs) obs, reward, done, info = env.step(action) -To solve this incompatibility, please use `NewGymAPIWrapper` provided in `ofrl/utils.py`. It should be used as follows. +To solve this incompatibility, please use `NewGymAPIWrapper` provided in `scope_rl/utils.py`. It should be used as follows. .. code-block:: Python - from ofrl.utils import NewGymAPIWrapper + from scope_rl.utils import NewGymAPIWrapper env = NewGymAPIWrapper(env) -.. rubric:: Q. xxx environment does not work on d3rlpy, which is used for model training. How should we fix it? (d3rlpy and OFRL is compatible to different version of Open AI Gym.) +.. rubric:: Q. xxx environment does not work on d3rlpy, which is used for model training. How should we fix it? (d3rlpy and SCOPE-RL is compatible to different version of Open AI Gym.) -A. While OFRL is compatible to the latest API of Open AI Gym, d3rlpy is not. Therefore, please use `OldGymAPIWrapper` provided in `ofrl/utils.py` to make the environment work for d3rlpy. +A. While SCOPE-RL is compatible to the latest API of Open AI Gym, d3rlpy is not. Therefore, please use `OldGymAPIWrapper` provided in `scope_rl/utils.py` to make the environment work for d3rlpy. .. code-block:: Python - from ofrl.utils import OldGymAPIWrapper - env = gym.make("xxx_v0") # compatible to gym>=0.26.2 and OFRL + from scope_rl.utils import OldGymAPIWrapper + env = gym.make("xxx_v0") # compatible to gym>=0.26.2 and SCOPE-RL env_ = OldGymAPIWrapper(env) # compatible to gym<0.26.2 and d3rlpy diff --git a/docs/documentation/index.rst b/docs/documentation/index.rst index 38ae9719..8ce25b0d 100644 --- a/docs/documentation/index.rst +++ b/docs/documentation/index.rst @@ -1,4 +1,4 @@ -OFRL; a Python library for offline reinforcement learning, off-policy evaluation, and selection +SCOPE-RL; a Python library for offline reinforcement learning, off-policy evaluation, and selection =================================== .. card:: logo @@ -7,11 +7,11 @@ OFRL; a Python library for offline reinforcement learning, off-policy evaluation Overview ~~~~~~~~~~ -*OFRL* is an open-source Python library for offline Reinforcement Learning (RL) and Off-Policy Evaluation and Selection (OPE/OPS). +*SCOPE-RL* is an open-source Python library for offline Reinforcement Learning (RL) and Off-Policy Evaluation and Selection (OPE/OPS). This library aims to facilitate an easy, flexible and reliable experiment in offline RL research, as well as to provide a streamlined implementation for practitioners. -OFRL includes a series of modules to implement synthetic dataset generation and dataset preprocessing and methods for conducting and evaluating OPE/OPS. +SCOPE-RL includes a series of modules to implement synthetic dataset generation and dataset preprocessing and methods for conducting and evaluating OPE/OPS. -OFRL is applicable to any RL environment with `OpenAI Gym `_ or `Gymnasium `_-like interface. +SCOPE-RL is applicable to any RL environment with `OpenAI Gym `_ or `Gymnasium `_-like interface. The library is also compatible with `d3rlpy `_, which provides the algorithm implementation of both online and offline RL methods. Our software facilitates implementation, evaluation and algorithm comparison related to the following research topics: @@ -29,14 +29,14 @@ Our software facilitates implementation, evaluation and algorithm comparison rel
* **Offline Reinforcement Learning**: - Offline RL aims to learn a new policy from only offline logged data collected by a behavior policy. OFRL enables a flexible experiment using customized dataset on diverse environments collected by various behavior policies. + Offline RL aims to learn a new policy from only offline logged data collected by a behavior policy. SCOPE-RL enables a flexible experiment using customized dataset on diverse environments collected by various behavior policies. * **Off-Policy Evaluation**: - OPE aims to evaluate the policies of a counterfactual policy using only offline logged data. OFRL supports the basic implementations of OPE estimators and streamline the experimental procedure to evaluate OPE estimators. + OPE aims to evaluate the policies of a counterfactual policy using only offline logged data. SCOPE-RL supports the basic implementations of OPE estimators and streamline the experimental procedure to evaluate OPE estimators. * **Off-Policy Selection**: OPS aims to select the top-:math:`k` policies from several candidate policies using offline logged data. Typically, the final production policy is chosen based on the online A/B tests results of the selected top-$k$ policies. - OFRL supports the basic implementations of OPS methods and provide some metrics to evaluate OPS result. + SCOPE-RL supports the basic implementations of OPS methods and provide some metrics to evaluate OPS result. .. note:: @@ -44,16 +44,16 @@ Our software facilitates implementation, evaluation and algorithm comparison rel 1. Explain the basic concepts in :doc:`Overview (online/offline RL) ` and :doc:`Overview (OPE/OPS) `. 2. Provide a variety of examples of conducting offline RL and OPE/OPS in practical problem settings in :doc:`Quickstart ` and :doc:`Tutorial `. - 3. Describe the algorithms and implementations in detail in :doc:`Supported Implementation ` and :doc:`Package Reference `. + 3. Describe the algorithms and implementations in detail in :doc:`Supported Implementation ` and :doc:`Package Reference `. - **You can also find the distinctive features of OFRL here:** :doc:`distinctive_features` + **You can also find the distinctive features of SCOPE-RL here:** :doc:`distinctive_features` Implementation ~~~~~~~~~~ Data Collection Policy and Offline RL ---------- -OFRL override `d3rlpy `_'s implementation for the base algorithm. +SCOPE-RL override `d3rlpy `_'s implementation for the base algorithm. We provide a wrapper class for transforming the policy into a stochastic policy as follows. Discrete @@ -76,7 +76,7 @@ Basic OPE Policy Value Estimated by OPE Estimators OPRL provides a variety of OPE estimators both in discrete and continuous action spaces. -Moreover, OFRL also implements meta class to handle OPE with multiple estimators at once and provide generic classes of OPE estimators to facilitate research development. +Moreover, SCOPE-RL also implements meta class to handle OPE with multiple estimators at once and provide generic classes of OPE estimators to facilitate research development. Basic estimators ^^^^^^ @@ -134,7 +134,7 @@ Cumulative Distribution OPE Cumulative Distribution Function Estimated by OPE Estimators -OFRL also provides cumulative distribution OPE estimators, which enables practitioners to evaluate various risk metrics (e.g., conditional value at risk) for safety assessment. +SCOPE-RL also provides cumulative distribution OPE estimators, which enables practitioners to evaluate various risk metrics (e.g., conditional value at risk) for safety assessment. Meta class and generic abstract class are available also for cumulative distribution OPE. Estimators @@ -162,7 +162,7 @@ Off-Policy Selection Metrics Comparison of the Top-k Statistics of 10% Lower Quartile of Policy Value -Finally, OFRL also standardizes the evaluation protocol of OPE in two axes, first by measuring the accuracy of OPE over the whole candidate policies, +Finally, SCOPE-RL also standardizes the evaluation protocol of OPE in two axes, first by measuring the accuracy of OPE over the whole candidate policies, and second by evaluating the gains and costs in top-k deployment (e.g., the best and worst performance in top-k deployment). The streamlined implementations and visualization of OPS class provide informative insights on offline RL and OPE performance. @@ -210,7 +210,7 @@ For any question about the paper and pipeline, feel free to contact: kiyohara.h. Contribution ~~~~~~~~~~ -Any contributions to OFRL are more than welcome! +Any contributions to SCOPE-RL are more than welcome! Please refer to `CONTRIBUTING.md <>`_ for general guidelines how to contribute to the project. Table of Contents @@ -256,7 +256,7 @@ Table of Contents :maxdepth: 2 :caption: Package References: - ofrl_api + scope_rl_api subpackages/rtbgym_api subpackages/recgym_api subpackages/basicgym_api @@ -265,12 +265,12 @@ Table of Contents :maxdepth: 1 :caption: See also: - Github - LICENSE + Github + LICENSE frequently_asked_questions News - Release Notes - Proceedings + Release Notes + Proceedings references .. grid:: @@ -302,7 +302,7 @@ Table of Contents :padding: 0 Next >>> - **Why_OFRL?** + **Why_SCOPE-RL?** .. grid-item-card:: :link: /documentation/index diff --git a/docs/documentation/installation.rst b/docs/documentation/installation.rst index e33a563e..80eca34b 100644 --- a/docs/documentation/installation.rst +++ b/docs/documentation/installation.rst @@ -3,7 +3,7 @@ Installation ========== -``ofrl`` is available on PyPI, and can be installed from ``pip`` or source as follows. +``scope-rl`` is available on PyPI, and can be installed from ``pip`` or source as follows. .. card:: @@ -11,12 +11,12 @@ Installation .. code-tab:: bash From :class:`pip` - pip install ofrl + pip install scope-rl .. code-tab:: bash From source - git clone https://github.com/negocia-inc/ofrl - cd ofrl + git clone https://github.com/negocia-inc/scope-rl + cd scope-rl python setup.py install @@ -27,20 +27,22 @@ Installation Citation ========== -If you use our pipeline or the top-:math:`k` RRT metrics in your work, please cite our paper below. +If you use our pipeline or the SharpRatio@k metric in your work, please cite our paper below. .. card:: - | **Title** [`arXiv <>`_] [`Proceedings <>`_] - | Authors. + | Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito. + | **SCOPE-RL: Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning** [`arXiv <>`_] [`Proceedings <>`_] + | (a preprint coming soon..) .. code-block:: - @article{kiyohara2023xxx - title={}, - author={}, - journal={}, - year={}, + @article{kiyohara2023scope, + author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, + title = {SCOPE-RL: Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning}, + journal = {A github repository}, + pages = {xxx--xxx}, + year = {2023}, } .. raw:: html diff --git a/docs/documentation/learning_implementation.rst b/docs/documentation/learning_implementation.rst index 70e8fade..60e3c069 100644 --- a/docs/documentation/learning_implementation.rst +++ b/docs/documentation/learning_implementation.rst @@ -16,7 +16,7 @@ It takes an RL environment as input to instantiate the class. .. code-block:: python # initialize the dataset class - from ofrl.dataset import SyntheticDataset + from scope_rl.dataset import SyntheticDataset dataset = SyntheticDataset( env=env, max_episode_steps=env.step_per_episode, @@ -48,7 +48,7 @@ Then, it collects logged data by a behavior policy (i.e., data collection policy .. code-block:: python - from ofrl.policy import DiscreteEpsilonGreedyHead + from scope_rl.policy import DiscreteEpsilonGreedyHead behavior_policy = DiscreteEpsilonGreedyHead( base_policy, # AlgoBase of d3rlpy n_actions=env.action_space.n, @@ -62,7 +62,7 @@ Then, it collects logged data by a behavior policy (i.e., data collection policy .. code-block:: python - from ofrl.policy import ContinuousGaussianHead + from scope_rl.policy import ContinuousGaussianHead behavior_policy = ContinuousGaussianHead( base_policy, # AlgoBase of d3rlpy sigma=1.0, @@ -112,7 +112,7 @@ Then, it collects logged data by a behavior policy (i.e., data collection policy .. seealso:: - :doc:`API reference of BaseDataset<_autosummary/dataset/ofrl.dataset.base>` explains the meaning of each keys in detail. + :doc:`API reference of BaseDataset<_autosummary/dataset/scope_rl.dataset.base>` explains the meaning of each keys in detail. .. dropdown:: How to handle multiple logged datasets at once? @@ -153,7 +153,7 @@ Then, it collects logged data by a behavior policy (i.e., data collection policy .. code-block:: python - from ofrl.utils import MultipleLoggedDataset + from scope_rl.utils import MultipleLoggedDataset multiple_logged_dataset = MultipleLoggedDataset( action_type="discrete", @@ -175,8 +175,8 @@ Then, it collects logged data by a behavior policy (i.e., data collection policy .. seealso:: - * :doc:`API reference of MultipleLoggedDataset <_autosummary/ofrl.utils.MultipleLoggedDataset>` - * :ref:`Tutorial with MultipleLoggedDataset ` + * :doc:`API reference of MultipleLoggedDataset <_autosummary/scope_rl.utils.MultipleLoggedDataset>` + * :ref:`Tutorial with MultipleLoggedDataset ` .. dropdown:: How to collect data in a non-episodic setting? @@ -195,7 +195,7 @@ Then, it collects logged data by a behavior policy (i.e., data collection policy .. seealso:: - * :doc:`quickstart` and :ref:`related tutorials ` + * :doc:`quickstart` and :ref:`related tutorials ` .. _implementation_opl: @@ -261,7 +261,7 @@ we also provide :class:`OffPolicyLearning` as a meta class to further smoothen t ) # off-policy learning - from ofrl.policy import OffPolicyLearning + from scope_rl.policy import OffPolicyLearning opl = OffPolicyLearning( fitting_args={"n_steps": 10000}, ) @@ -276,7 +276,7 @@ Using :class:`OffPolicyLearning`, we can also convert the deterministic base pol .. code-block:: python # policy wrapper - from ofrl.policy import DiscreteEpsilonGreedyHead as EpsilonGreedyHead + from scope_rl.policy import DiscreteEpsilonGreedyHead as EpsilonGreedyHead policy_wrappers = { "eps_00": ( EpsilonGreedyHead, { @@ -363,11 +363,11 @@ The obtained evaluation policies are the following (both algorithms and policy w .. seealso:: * :ref:`How to obtain MultipleLoggedDataset? ` - * :ref:`Tutorial with MultipleLoggedDataset ` + * :ref:`Tutorial with MultipleLoggedDataset ` .. seealso:: - * :doc:`quickstart` and :ref:`related tutorials ` + * :doc:`quickstart` and :ref:`related tutorials ` .. _implementation_policy_head: @@ -409,12 +409,12 @@ Here, we describe some useful wrapper tools to convert a `d3rlpy ` + * :doc:`Package Reference of BaseHead and implemented policy heads <_autosummary/scope_rl.policy.head>` .. seealso:: - * :ref:`Related tutorials ` + * :ref:`Related tutorials ` .. _implementation_discrete_head: @@ -454,11 +454,11 @@ This module enables step-wise interaction of the policy. Online Evaluation ~~~~~~~~~~ -Finally, we provide the series of functions to be used for online performance evaluation in :doc:`ofrl/ope/online.py <_autosummary/ofrl.ope.online>`. +Finally, we provide the series of functions to be used for online performance evaluation in :doc:`scope_rl/ope/online.py <_autosummary/scope_rl.ope.online>`. .. seealso:: - * :ref:`Related tutorials ` + * :ref:`Related tutorials ` (Rollout) @@ -529,7 +529,7 @@ Finally, we provide the series of functions to be used for online performance ev **Off_policy Evaluation** .. grid-item-card:: - :link: ofrl_api + :link: scope_rl_api :link-type: doc :shadow: none :margin: 0 diff --git a/docs/documentation/news.rst b/docs/documentation/news.rst index d4ca5e5c..5a17fecc 100644 --- a/docs/documentation/news.rst +++ b/docs/documentation/news.rst @@ -1,11 +1,11 @@ News ========== -Follow us on `Google Group <>`_! +Follow us on `Google Group (scope-rl@googlegroups.com) `_! 2023 ~~~~~~~~~~ **2023.xx.xx** xxx -**2023.xx.xx** Released :class:`v0.0.0` of OFRL! [`PyPI <>`_] [`Release Note <>`_] \ No newline at end of file +**2023.xx.xx** Released :class:`v0.0.0` of SCOPE-RL! [`PyPI <>`_] [`Release Note <>`_] \ No newline at end of file diff --git a/docs/documentation/ofrl_api.rst b/docs/documentation/ofrl_api.rst index 83e50e91..f23f191c 100644 --- a/docs/documentation/ofrl_api.rst +++ b/docs/documentation/ofrl_api.rst @@ -1,8 +1,8 @@ ========== -OFRL Package Reference +SCOPR-RL Package Reference ========== -.. _ofrl_api_dataset: +.. _scope_rl_api_dataset: dataset module ---------- @@ -11,10 +11,10 @@ dataset module :recursive: :nosignatures: - ofrl.dataset.base - ofrl.dataset.synthetic + scope_rl.dataset.base + scope_rl.dataset.synthetic -.. _ofrl_api_policy: +.. _scope_rl_api_policy: policy module ---------- @@ -24,15 +24,15 @@ policy module :nosignatures: :template: module_head - ofrl.policy.head - ofrl.policy.opl + scope_rl.policy.head + scope_rl.policy.opl -.. _ofrl_api_ope: +.. _scope_rl_api_ope: ope module ---------- -.. _ofrl_api_ope_pipeline: +.. _scope_rl_api_ope_pipeline: pipeline ^^^^^^ @@ -41,11 +41,11 @@ pipeline :recursive: :nosignatures: - ofrl.ope.input - ofrl.ope.ope - ofrl.ope.ops + scope_rl.ope.input + scope_rl.ope.ope + scope_rl.ope.ops -.. _ofrl_api_ope_estimators: +.. _scope_rl_api_ope_estimators: OPE estimators ^^^^^^ @@ -54,15 +54,15 @@ OPE estimators :recursive: :nosignatures: - ofrl.ope.estimators_base - ofrl.ope.basic_estimators_discrete - ofrl.ope.basic_estimators_continuous - ofrl.ope.marginal_estimators_discrete - ofrl.ope.marginal_estimators_continuous - ofrl.ope.cumulative_distribution_estimators_discrete - ofrl.ope.cumulative_distribution_estimators_continuous + scope_rl.ope.estimators_base + scope_rl.ope.basic_estimators_discrete + scope_rl.ope.basic_estimators_continuous + scope_rl.ope.marginal_estimators_discrete + scope_rl.ope.marginal_estimators_continuous + scope_rl.ope.cumulative_distribution_estimators_discrete + scope_rl.ope.cumulative_distribution_estimators_continuous -.. _ofrl_api_ope_weight_and_value_learning: +.. _scope_rl_api_ope_weight_and_value_learning: weight and value learning methods ^^^^^^ @@ -72,16 +72,16 @@ weight and value learning methods :nosignatures: :template: module_weight_value_learning - ofrl.ope.weight_value_learning.base - ofrl.ope.weight_value_learning.function - ofrl.ope.weight_value_learning.augmented_lagrangian_learning_discrete - ofrl.ope.weight_value_learning.augmented_lagrangian_learning_continuous - ofrl.ope.weight_value_learning.minimax_weight_learning_discrete - ofrl.ope.weight_value_learning.minimax_weight_learning_continuous - ofrl.ope.weight_value_learning.minimax_value_learning_discrete - ofrl.ope.weight_value_learning.minimax_value_learning_continuous + scope_rl.ope.weight_value_learning.base + scope_rl.ope.weight_value_learning.function + scope_rl.ope.weight_value_learning.augmented_lagrangian_learning_discrete + scope_rl.ope.weight_value_learning.augmented_lagrangian_learning_continuous + scope_rl.ope.weight_value_learning.minimax_weight_learning_discrete + scope_rl.ope.weight_value_learning.minimax_weight_learning_continuous + scope_rl.ope.weight_value_learning.minimax_value_learning_discrete + scope_rl.ope.weight_value_learning.minimax_value_learning_continuous -.. _ofrl_api_ope_utils: +.. _scope_rl_api_ope_utils: others ^^^^^^ @@ -90,9 +90,9 @@ others :recursive: :nosignatures: - ofrl.ope.online + scope_rl.ope.online -.. _ofrl_api_utils: +.. _scope_rl_api_utils: others ---------- @@ -101,7 +101,7 @@ others :recursive: :nosignatures: - ofrl.utils + scope_rl.utils .. raw:: html diff --git a/docs/documentation/online_offline_rl.rst b/docs/documentation/online_offline_rl.rst index 1a0b3b63..2b3eb418 100644 --- a/docs/documentation/online_offline_rl.rst +++ b/docs/documentation/online_offline_rl.rst @@ -292,7 +292,7 @@ This prevents the propagation of the overestimation bias, even when the basic TD .. seealso:: * :doc:`Supported implementations and useful tools ` - * :doc:`Quickstart ` and :doc:`related tutorials <_autogallery/ofrl_others/index>` + * :doc:`Quickstart ` and :doc:`related tutorials <_autogallery/scope_rl_others/index>` .. seealso:: diff --git a/docs/documentation/ope_ops.rst b/docs/documentation/ope_ops.rst index 45efcf34..3318779e 100644 --- a/docs/documentation/ope_ops.rst +++ b/docs/documentation/ope_ops.rst @@ -54,8 +54,8 @@ by dealing with the distribution shift between :math:`\pi_0` and :math:`\pi`. .. seealso:: - * :doc:`Supported OPE estimators ` and :doc:`their API reference <_autosummary/ofrl.ope.basic_estimators_discrete>` - * (advanced) :doc:`Marginal OPE estimators `, and their :doc:`API reference <_autosummary/ofrl.ope.marginal_ope_discrete>` + * :doc:`Supported OPE estimators ` and :doc:`their API reference <_autosummary/scope_rl.ope.basic_estimators_discrete>` + * (advanced) :doc:`Marginal OPE estimators `, and their :doc:`API reference <_autosummary/scope_rl.ope.marginal_ope_discrete>` * :doc:`Quickstart ` and :doc:`related tutorials <_autogallery/basic_ope/index>` .. _overview_cumulative_distribution_ope: @@ -82,7 +82,7 @@ and :math:`dF(G) := \mathrm{lim}_{\Delta \rightarrow 0} F(G) - F(G- \Delta)`. .. seealso:: - * :doc:`Supported OPE estimators ` and :doc:`their API reference <_autosummary/ofrl.ope.cumulative_distribution_estimators_discrete>` + * :doc:`Supported OPE estimators ` and :doc:`their API reference <_autosummary/scope_rl.ope.cumulative_distribution_estimators_discrete>` * :doc:`Quickstart ` and :doc:`related tutorials <_autogallery/cumulative_distribution_ope/index>` .. _overview_ops: @@ -105,7 +105,7 @@ which are the main contribution of our research paper `"SCOPE-RL: Towards Risk-R .. seealso:: * :doc:`Conventional OPS metrics and top-k RRT metrics ` - * :doc:`OPS evaluation protocols ` and :doc:`their API reference <_autosummary/ofrl.ope.ops>` + * :doc:`OPS evaluation protocols ` and :doc:`their API reference <_autosummary/scope_rl.ope.ops>` * :doc:`Quickstart ` and :doc:`related tutorials <_autogallery/ops/index>` .. seealso:: diff --git a/docs/documentation/quickstart.rst b/docs/documentation/quickstart.rst index 7fe1c813..9bc74b4e 100644 --- a/docs/documentation/quickstart.rst +++ b/docs/documentation/quickstart.rst @@ -25,7 +25,7 @@ The workflow mainly consists of following three steps: .. seealso:: - * :doc:`distinctive_features` describes the distinctive features of OFRL in detail. + * :doc:`distinctive_features` describes the distinctive features of SCOPE-RL in detail. * :doc:`Overview (online/offline RL) ` and :doc:`Overview (OPE/OPS) ` describe the problem settings. .. _quickstart_dataset: @@ -34,16 +34,16 @@ Synthetic Dataset Generation and Data Preprocessing ~~~~~~~~~~ We start by collecting the logged data using DDQN :cite:`van2016deep` as a behavior policy. -Note that, in the following example, we use :doc:`RTBGym ` (a sub-package of OFRL) and `d3rlpy `_. Please satisfy the `requirements <>`_ in advance. +Note that, in the following example, we use :doc:`RTBGym ` (a sub-package of SCOPE-RL) and `d3rlpy `_. Please satisfy the `requirements <>`_ in advance. .. code-block:: python # implement data collection procedure on the RTBGym environment - # import ofrl modules - from ofrl.dataset import SyntheticDataset - from ofrl.policy import DiscreteEpsilonGreedyHead + # import SCOPE-RL modules + from scope_rl.dataset import SyntheticDataset + from scope_rl.policy import DiscreteEpsilonGreedyHead # import d3rlpy algorithms from d3rlpy.algos import DoubleDQN from d3rlpy.online.buffers import ReplayBuffer @@ -101,8 +101,8 @@ Moreover, by preprocessing the logged data, one can also handle their own logged .. seealso:: - * :doc:`Related tutorials <_autogallery/ofrl_others/index>` - * API references of :ref:`dataset modules ` and :ref:`policy wrapper (Head) ` + * :doc:`Related tutorials <_autogallery/scope_rl_others/index>` + * API references of :ref:`dataset modules ` and :ref:`policy wrapper (Head) ` .. _quickstart_offlinerl: @@ -114,7 +114,7 @@ Note that, we use `d3rlpy `_ for offline RL. .. code-block:: python - # implement offline RL procedure using ofrl and d3rlpy + # implement offline RL procedure using scope_rl and d3rlpy # import d3rlpy algorithms from d3rlpy.dataset import MDPDataset @@ -141,7 +141,7 @@ Note that, we use `d3rlpy `_ for offline RL. .. seealso:: - * :doc:`Related tutorials <_autogallery/ofrl_others/index>` + * :doc:`Related tutorials <_autogallery/scope_rl_others/index>` * :ref:`Problem setting ` * :doc:`Supported implementations and useful tools ` * (external) `d3rlpy's documentation `_ @@ -171,15 +171,15 @@ and Doubly Robust (DR) :cite:`jiang2016doubly` :cite:`thomas2016data`. .. code-block:: python - # implement OPE procedure using OFRL + # implement OPE procedure using SCOPE-RL - # import OFRL modules - from ofrl.ope import CreateOPEInput - from ofrl.ope import DiscreteOffPolicyEvaluation as OPE - from ofrl.ope import DiscreteDirectMethod as DM - from ofrl.ope import DiscreteTrajectoryWiseImportanceSampling as TIS - from ofrl.ope import DiscretePerDecisionImportanceSampling as PDIS - from ofrl.ope import DiscreteDoublyRobust as DR + # import SCOPE-RL modules + from scope_rl.ope import CreateOPEInput + from scope_rl.ope import DiscreteOffPolicyEvaluation as OPE + from scope_rl.ope import DiscreteDirectMethod as DM + from scope_rl.ope import DiscreteTrajectoryWiseImportanceSampling as TIS + from scope_rl.ope import DiscretePerDecisionImportanceSampling as PDIS + from scope_rl.ope import DiscreteDoublyRobust as DR # (4) Evaluate the learned policy in an offline manner # we compare ddqn, cql, and random policy @@ -234,15 +234,15 @@ and Doubly Robust (DR) :cite:`jiang2016doubly` :cite:`thomas2016data`. Policy Value Estimated by OPE Estimators -Users can implement their own OPE estimators by following the interface of :class:`ofrl.ope.BaseOffPolicyEstimator`. -In addition, :class:`ofrl.ope.OffPolicyEvaluation` summarizes and compares the estimation results of various OPE estimators. +Users can implement their own OPE estimators by following the interface of :class:`scope_rl.ope.BaseOffPolicyEstimator`. +In addition, :class:`scope_rl.ope.OffPolicyEvaluation` summarizes and compares the estimation results of various OPE estimators. .. seealso:: * :doc:`Related tutorials <_autogallery/basic_ope/index>` * :doc:`Problem setting ` - * :doc:`Supported OPE estimators ` and :doc:`their API reference <_autosummary/ofrl.ope.basic_estimators_discrete>` - * (advanced) :ref:`Marginal OPE estimators `, and their :doc:`API reference <_autosummary/ofrl.ope.marginal_ope_discrete>` + * :doc:`Supported OPE estimators ` and :doc:`their API reference <_autosummary/scope_rl.ope.basic_estimators_discrete>` + * (advanced) :ref:`Marginal OPE estimators `, and their :doc:`API reference <_autosummary/scope_rl.ope.marginal_ope_discrete>` .. _quickstart_cumulative_distribution_ope: @@ -261,13 +261,13 @@ using Cumulative Distribution OPE estimators :cite:`huang2021off` :cite:`huang20 .. code-block:: python - # import OFRL modules - from ofrl.ope import DiscreteCumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE - from ofrl.ope import DiscreteCumulativeDistributionDirectMethod as CD_DM - from ofrl.ope import DiscreteCumulativeDistributionTrajectoryWiseImportanceSampling as CD_IS - from ofrl.ope import DiscreteCumulativeDistributionTrajectoryWiseDoublyRobust as CD_DR - from ofrl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseImportanceSampling as CD_SNIS - from ofrl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseDoublyRobust as CD_SNDR + # import SCOPE-RL modules + from scope_rl.ope import DiscreteCumulativeDistributionOffPolicyEvaluation as CumulativeDistributionOPE + from scope_rl.ope import DiscreteCumulativeDistributionDirectMethod as CD_DM + from scope_rl.ope import DiscreteCumulativeDistributionTrajectoryWiseImportanceSampling as CD_IS + from scope_rl.ope import DiscreteCumulativeDistributionTrajectoryWiseDoublyRobust as CD_DR + from scope_rl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseImportanceSampling as CD_SNIS + from scope_rl.ope import DiscreteCumulativeDistributionSelfNormalizedTrajectoryWiseDoublyRobust as CD_SNDR # (4) Evaluate the learned policy using cumulative distribution function (in an offline manner) # we compare ddqn, cql, and random policy defined in the previous section (i.e., (3) of basic OPE procedure) @@ -295,15 +295,15 @@ using Cumulative Distribution OPE estimators :cite:`huang2021off` :cite:`huang20 Cumulative Distribution Function Estimated by OPE Estimators -Users can implement their own OPE estimators by following the interface of :class:`ofrl.ope.BaseCumulativeDistributionOffPolicyEstimator`. -In addition, :class:`ofrl.ope.DiscreteCumulativeDistributionOffPolicyEvaluation` summarizes and compares the estimation results of various OPE estimators. +Users can implement their own OPE estimators by following the interface of :class:`scope_rl.ope.BaseCumulativeDistributionOffPolicyEstimator`. +In addition, :class:`scope_rl.ope.DiscreteCumulativeDistributionOffPolicyEvaluation` summarizes and compares the estimation results of various OPE estimators. .. seealso:: * :doc:`Related tutorials <_autogallery/cumulative_distribution_ope/index>` * :ref:`Problem setting ` * :ref:`Supported cumulative distribution OPE estimators ` - and :doc:`their API reference <_autosummary/ofrl.ope.cumulative_distribution_ope_discrete>` + and :doc:`their API reference <_autosummary/scope_rl.ope.cumulative_distribution_ope_discrete>` .. _quickstart_ops: @@ -313,8 +313,8 @@ Finally, we provide the code to conduct OPS, which selects the "best" performing .. code-block:: python - # import OFRL modules - from ofrl.ope import OffPolicySelection + # import SCOPE-RL modules + from scope_rl.ope import OffPolicySelection # (5) Conduct Off-Policy Selection # Initialize the OPS class @@ -364,7 +364,7 @@ Finally, we provide the code to conduct OPS, which selects the "best" performing * :doc:`Related tutorials <_autogallery/ops/index>` * :ref:`Problem setting ` - * :ref:`OPS evaluation protocols ` and :doc:`their API reference <_autosummary/ofrl.ope.ops>` + * :ref:`OPS evaluation protocols ` and :doc:`their API reference <_autosummary/scope_rl.ope.ops>` ~~~~~ diff --git a/docs/documentation/references.rst b/docs/documentation/references.rst index 0d963f7d..86b97538 100644 --- a/docs/documentation/references.rst +++ b/docs/documentation/references.rst @@ -12,7 +12,7 @@ Papers Projects ---------- -This project and the main package of OFRL is strongly inspired by the following three packages. +This project and the main package of SCOPE-RL is strongly inspired by the following three packages. * **Open Bandit Pipeline** :cite:`saito2021open` -- a pipeline implementation of OPE in contextual bandit setup: `[github] `_ `[documentation] `_ `[paper] `_. * **d3rlpy** :cite:`seno2021d3rlpy` -- a set of implementations of offline RL algorithms: `[github] `_ `[documentation] `_ `[paper] `_. diff --git a/docs/documentation/subpackages/basicgym_about.rst b/docs/documentation/subpackages/basicgym_about.rst index 0abcb1b3..9b7e5c08 100644 --- a/docs/documentation/subpackages/basicgym_about.rst +++ b/docs/documentation/subpackages/basicgym_about.rst @@ -7,7 +7,7 @@ Overview The simulator is particularly intended for reinforcement learning algorithms and follows `OpenAI Gym `_ and `Gymnasium `_ interface. We design BasicGym as a configurative environment so that researchers and practitioner can customize the environmental modules including UserModel. -Note that, BasicGym is publicized as a sub-package of :doc:`OFRL `, which streamlines the implementation of offline reinforcement learning (offline RL) and off-policy evaluation and selection (OPE/OPS) procedures. +Note that, BasicGym is publicized as a sub-package of :doc:`SCOPE-RL `, which streamlines the implementation of offline reinforcement learning (offline RL) and off-policy evaluation and selection (OPE/OPS) procedures. Basic Setting ~~~~~~~~~~ @@ -45,7 +45,7 @@ Quickstart and Configurations ~~~~~~~~~~ We provide an example usage of the standard and customized environment. -The online/offlline RL and OPE/OPS examples are provides in :doc:`OFRL's quickstart `. +The online/offlline RL and OPE/OPS examples are provides in :doc:`SCOPE-RL's quickstart `. Standard BasicEnv ---------- @@ -75,7 +75,7 @@ Let's interact with a uniform random policy. .. code-block:: python - #from ofrl.policy import OnlineHead + from scope_rl.policy import OnlineHead from d3rlpy.algos import RandomPolicy as ContinuousRandomPolicy # (1) define a random agent @@ -98,7 +98,7 @@ Let's interact with a uniform random policy. action = agent.predict_online(obs) obs, reward, done, truncated, info = env.step(action) -Note that, while we use :doc:`OFRL ` and `d3rlpy `_ here, +Note that, while we use :doc:`SCOPE-RL ` and `d3rlpy `_ here, BasicGym is compatible with any other libraries that is compatible to the `OpenAI Gym `_ and `Gymnasium `_ interface. diff --git a/docs/documentation/subpackages/index.rst b/docs/documentation/subpackages/index.rst index 5d5126f7..6471c6cb 100644 --- a/docs/documentation/subpackages/index.rst +++ b/docs/documentation/subpackages/index.rst @@ -3,7 +3,7 @@ Sub-packages ========== -OFRL provides brbrbr... +SCOPE-RL provides three sub-packages to simulate RL interactions in various situations .. grid-item-card:: :columns: 8 @@ -12,29 +12,29 @@ OFRL provides brbrbr... **RTBGym**: Real-Time Bedding Environment for Online Advertising - .. grid:: - :gutter: 1 + .. .. grid:: + .. :gutter: 1 - .. grid-item:: - :columns: 4 + .. .. grid-item:: + .. :columns: 4 - .. grid:: 1 - :gutter: 1 + .. .. grid:: 1 + .. :gutter: 1 - .. grid-item-card:: - :img-background: /_static/images/rtbgym.png - :shadow: none + .. .. grid-item-card:: + .. :img-background: /_static/images/rtbgym.png + .. :shadow: none - .. grid-item:: - :columns: 8 + .. .. grid-item:: + .. :columns: 8 - .. grid:: 1 - :gutter: 1 - :padding: 1 + .. .. grid:: 1 + .. :gutter: 1 + .. :padding: 1 - .. grid-item:: + .. .. grid-item:: - brief description + .. brief description .. raw:: html @@ -47,29 +47,29 @@ OFRL provides brbrbr... **RECGym**: Recommender Systems Environment for E-commerce - .. grid:: - :gutter: 1 + .. .. grid:: + .. :gutter: 1 - .. grid-item:: - :columns: 4 + .. .. grid-item:: + .. :columns: 4 - .. grid:: 1 - :gutter: 1 + .. .. grid:: 1 + .. :gutter: 1 - .. grid-item-card:: - :img-background: /_static/images/recgym.png - :shadow: none + .. .. grid-item-card:: + .. :img-background: /_static/images/recgym.png + .. :shadow: none - .. grid-item:: - :columns: 8 + .. .. grid-item:: + .. :columns: 8 - .. grid:: 1 - :gutter: 1 - :padding: 1 + .. .. grid:: 1 + .. :gutter: 1 + .. :padding: 1 - .. grid-item:: + .. .. grid-item:: - brief description + .. brief description .. raw:: html @@ -82,29 +82,29 @@ OFRL provides brbrbr... **BasicGym**: Basic Environment - .. grid:: - :gutter: 1 + .. .. grid:: + .. :gutter: 1 - .. grid-item:: - :columns: 4 + .. .. grid-item:: + .. :columns: 4 - .. grid:: 1 - :gutter: 1 + .. .. grid:: 1 + .. :gutter: 1 - .. grid-item-card:: - :img-background: /_static/images/basicgym.png - :shadow: none + .. .. grid-item-card:: + .. :img-background: /_static/images/basicgym.png + .. :shadow: none - .. grid-item:: - :columns: 8 + .. .. grid-item:: + .. :columns: 8 - .. grid:: 1 - :gutter: 1 - :padding: 1 + .. .. grid:: 1 + .. :gutter: 1 + .. :padding: 1 - .. grid-item:: + .. .. grid-item:: - brief description + .. brief description .. raw:: html diff --git a/docs/documentation/subpackages/recgym_about.rst b/docs/documentation/subpackages/recgym_about.rst index 28b1b4ae..0893579d 100644 --- a/docs/documentation/subpackages/recgym_about.rst +++ b/docs/documentation/subpackages/recgym_about.rst @@ -7,7 +7,7 @@ Overview The simulator is particularly intended for reinforcement learning algorithms and follows `OpenAI Gym `_ and `Gymnasium `_ interface. We design RECGym as a configurative environment so that researchers and practitioner can customize the environmental modules including UserModel. -Note that, RECGym is publicized as a sub-package of :doc:`OFRL `, which streamlines the implementation of offline reinforcement learning (offline RL) and off-policy evaluation and selection (OPE/OPS) procedures. +Note that, RECGym is publicized as a sub-package of :doc:`SCOPE-RL `, which streamlines the implementation of offline reinforcement learning (offline RL) and off-policy evaluation and selection (OPE/OPS) procedures. Basic Setting ~~~~~~~~~~ @@ -46,7 +46,7 @@ Quickstart and Configurations ~~~~~~~~~~ We provide an example usage of the standard and customized environment. -The online/offlline RL and OPE/OPS examples are provides in :doc:`OFRL's quickstart `. +The online/offlline RL and OPE/OPS examples are provides in :doc:`SCOPE-RL's quickstart `. Standard RECEnv ---------- @@ -98,7 +98,7 @@ Let's interact uniform random policy with a discrete action REC environment. action = agent.predict_online(obs) obs, reward, done, truncated, info = env.step(action) -Note that, while we use :doc:`OFRL ` and `d3rlpy `_ here, +Note that, while we use :doc:`SCOPE-RL ` and `d3rlpy `_ here, RECGym is compatible with any other libraries that is compatible to the `OpenAI Gym `_ and `Gymnasium `_ interface. diff --git a/docs/documentation/subpackages/rtbgym_about.rst b/docs/documentation/subpackages/rtbgym_about.rst index b2521013..fdc5a844 100644 --- a/docs/documentation/subpackages/rtbgym_about.rst +++ b/docs/documentation/subpackages/rtbgym_about.rst @@ -7,7 +7,7 @@ Overview The simulator is particularly intended for reinforcement learning algorithms and follows `OpenAI Gym `_ and `Gymnasium `_ interface. We design RTBGym as a configurative environment so that researchers and practitioner can customize the environmental modules including WinningPriceDistribution, ClickThroughRate, and ConversionRate. -Note that, RTBGym is publicized as a sub-package of :doc:`OFRL `, which streamlines the implementation of offline reinforcement learning (offline RL) and off-policy evaluation and selection (OPE/OPS) procedures. +Note that, RTBGym is publicized as a sub-package of :doc:`SCOPE-RL `, which streamlines the implementation of offline reinforcement learning (offline RL) and off-policy evaluation and selection (OPE/OPS) procedures. Basic Setting ~~~~~~~~~~ @@ -76,7 +76,7 @@ Quickstart and Configurations ~~~~~~~~~~ We provide an example usage of the standard and customized environment. -The online/offlline RL and OPE/OPS examples are provides in :doc:`OFRL's quickstart `. +The online/offlline RL and OPE/OPS examples are provides in :doc:`SCOPE-RL's quickstart `. Standard RTBEnv ---------- @@ -110,7 +110,7 @@ Let's interact uniform random policy with a continuous action RTB environment. T .. code-block:: python # import from other libraries - from ofrl.policy import OnlineHead + from scope_rl.policy import OnlineHead from d3rlpy.algos import RandomPolicy as ContinuousRandomPolicy from d3rlpy.preprocessing import MinMaxActionScaler import matplotlib.pyplot as plt @@ -133,7 +133,7 @@ Let's interact uniform random policy with a continuous action RTB environment. T action = agent.predict_online(obs) obs, reward, done, truncated, info = env.step(action) -Note that, while we use :doc:`OFRL ` and `d3rlpy `_ here, +Note that, while we use :doc:`SCOPE-RL ` and `d3rlpy `_ here, RTBGym is compatible with any other libraries that is compatible to the `OpenAI Gym `_ and `Gymnasium `_ interface. diff --git a/docs/documentation/visualization.rst b/docs/documentation/visualization.rst index 450bea15..e8e89812 100644 --- a/docs/documentation/visualization.rst +++ b/docs/documentation/visualization.rst @@ -219,7 +219,7 @@ This kind of visualization is again available for all point-wise estimates inclu :margin: 0 .. grid-item-card:: - :link: ofrl_api + :link: scope_rl_api :link-type: doc :shadow: none :margin: 0 diff --git a/docs/index.rst b/docs/index.rst index 2c6df696..d8b04924 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -6,7 +6,7 @@ :text-align: center :shadow: none - one sentence to describe + Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning .. button-ref:: documentation/index :ref-type: doc @@ -22,7 +22,7 @@ .. raw:: html -

Why OFRL?

+

Why SCOPE-RL?

.. grid-item:: :class: top-page-list @@ -43,7 +43,7 @@ .. raw:: html -

Try OFRL in two lines of code!

+

Try SCOPE-RL in two lines of code!

    @@ -185,66 +185,66 @@
-.. raw:: html +.. .. raw:: html -

Explore more with OFRL

+..

Explore more with SCOPE-RL

- +.. -.. card-carousel:: 4 +.. .. card-carousel:: 4 - .. card:: Basic Off-Policy Evaluation - :img-top: .png +.. .. card:: Basic Off-Policy Evaluation +.. :img-top: .png - .. card:: Marginal Off-Policy Evaluation - :img-top: .png +.. .. card:: Marginal Off-Policy Evaluation +.. :img-top: .png - .. card:: Cumulative Distribution Off-Policy Evaluation - :img-top: .png +.. .. card:: Cumulative Distribution Off-Policy Evaluation +.. :img-top: .png - .. card:: Off-Policy Selection - :img-top: .png +.. .. card:: Off-Policy Selection +.. :img-top: .png - .. card:: Evaluation of OPE/OPS - :img-top: .png +.. .. card:: Evaluation of OPE/OPS +.. :img-top: .png - .. card:: Ablation with various value functions - :img-top: .png +.. .. card:: Ablation with various value functions +.. :img-top: .png - .. card:: Ablation with xxx - :img-top: .png +.. .. card:: Ablation with xxx +.. :img-top: .png - .. card:: Handling multiple datasets - :img-top: .png +.. .. card:: Handling multiple datasets +.. :img-top: .png - .. card:: Evaluating with various behavior policies - :img-top: .png +.. .. card:: Evaluating with various behavior policies +.. :img-top: .png - .. card:: Evaluating on non-episodic setting - :img-top: .png +.. .. card:: Evaluating on non-episodic setting +.. :img-top: .png -.. raw:: html +.. .. raw:: html - +.. -.. card-carousel:: 4 +.. .. card-carousel:: 4 - .. card:: Example on Real-Time Bidding - :img-top: .png +.. .. card:: Example on Real-Time Bidding +.. :img-top: .png - .. card:: Example on Recommendation - :img-top: .png +.. .. card:: Example on Recommendation +.. :img-top: .png - .. card:: Example on xxx - :img-top: .png +.. .. card:: Example on xxx +.. :img-top: .png - .. card:: Example on xxx - :img-top: .png +.. .. card:: Example on xxx +.. :img-top: .png .. raw:: html @@ -252,31 +252,35 @@

Citation

-If you use our pipeline or the top-:math:`k` RRT metrics in your work, please cite our paper below. +If you use our pipeline or the SharpRatio@k metric in your work, please cite our paper below. + +.. card:: -| **Title** [`arXiv <>`_] [`Proceedings <>`_] -| Authors. + | Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito. + | **SCOPE-RL: Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning** [`arXiv <>`_] [`Proceedings <>`_] + | (a preprint coming soon..) -.. code-block:: + .. code-block:: - @article{kiyohara2023xxx - title={}, - author={}, - journal={}, - year={}, - } + @article{kiyohara2023scope, + author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, + title = {SCOPE-RL: Towards Risk-Return Assessments of Off-Policy Evaluation in Reinforcement Learning}, + journal = {A github repository}, + pages = {xxx--xxx}, + year = {2023}, + } .. raw:: html

Join us!

-Any contributions to OFRL are more than welcome! +Any contributions to SCOPE-RL are more than welcome! * `Guidelines for contribution (CONTRIBUTING.md) <>`_ -* `Google Group <>`_ +* `Google Group (scope-rl@googlegroups.com) `_! -If you have any questions, feel free to contact: kiyohara.h.aa@m.titech.ac.jp +If you have any questions, feel free to contact: hk844 [at] cornell.edu .. raw:: html @@ -295,13 +299,13 @@ Welcome! Installation Quickstart - Tutorial + .. Tutorial Documentation FAQs News Sub-packages - Release Notes - Proceedings + Release Notes + Proceedings .. grid:: diff --git a/recgym/README.md b/recgym/README.md index 2e304f0d..0223e481 100644 --- a/recgym/README.md +++ b/recgym/README.md @@ -19,7 +19,7 @@ *RECGym* is an open-source simulation platform for recommender system (REC), which is written in Python. The simulator is particularly intended for reinforcement learning algorithms and follows [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. We design RECGym as a configurative environment so that researchers and practitioner can customize the environmental modules including `UserModel`((i.e. `user_preference_dynamics` and `reward_function`). -Note that, RECGym is publicized under [ofrl](../) repository, which facilitates the implementation of offline reinforcement learning procedure. +Note that, RECGym is publicized under [SCOPE-RL](../) repository, which facilitates the implementation of offline reinforcement learning procedure. ### Basic Setting @@ -45,22 +45,22 @@ RECGym is configurative about the following a module. Note that, users can customize the above modules by following the [abstract class](./envs/simulator/base.py). ## Installation -RECGym can be installed as a part of [ofrl](../) using Python's package manager `pip`. +RECGym can be installed as a part of [SCOPE-RL](../) using Python's package manager `pip`. ``` -pip install ofrl +pip install scope-rl ``` You can also install from source. ```bash -git clone https://github.com/negocia-inc/ofrl -cd ofrl +git clone https://github.com/negocia-inc/scope-rl +cd scope-rl python setup.py install ``` ## Usage We provide an example usage of the standard and customized environment. \ -The online/offlline RL and Off-Policy Evaluation examples are provides in [OFRL's README](../README.md). +The online/offlline RL and Off-Policy Evaluation examples are provides in [SCOPE-RL's README](../README.md). ### Standard RECEnv @@ -88,7 +88,7 @@ Let's visualize the case with uniform random policy . ```Python # import from other libraries -from ofrl.policy import OnlineHead +from scope_rl.policy import OnlineHead from d3rlpy.algos import DiscreteRandomPolicy # define a random agent @@ -127,7 +127,7 @@ plt.show()

-Note that, while we use [ofrl](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, RECGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. +Note that, while we use [SCOPE-RL](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, RECGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. ### Customized RECEnv diff --git a/recgym/envs/rec.py b/recgym/envs/rec.py index e55c2341..83b1157a 100644 --- a/recgym/envs/rec.py +++ b/recgym/envs/rec.py @@ -82,10 +82,10 @@ class RECEnv(gym.Env): .. code-block:: python - # import necessary module from recgym and ofrl + # import necessary module from recgym and scope_rl from recgym.rec import RECEnv - from ofrl.policy import OnlineHead - from ofrl.ope.online import calc_on_policy_policy_value + from scope_rl.policy import OnlineHead + from scope_rl.ope.online import calc_on_policy_policy_value # import necessary module from other libraries from d3rlpy.algos import DiscreteRandomPolicy diff --git a/requirements.txt b/requirements.txt index e4b6694f..99c6dc6b 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,7 +1,11 @@ -scipy>=1.7.3 -numpy>=1.21.2 -pandas>=1.3.2 +scipy>=1.10.1 +numpy==1.22.4 # Currently checking compatibility with the latest version of seaborn. # [TODO] +pandas>=1.5.3 scikit-learn>=1.0.2 -torch>=1.9.0 +matplotlib>=3.7.1 +seaborn==0.11.2 # Currently checking compatibility with the latest version. # [TODO] +torch>=2.0.0 +d3rlpy>=1.1.1 gym>=0.26.2 -d3rlpy>=1.1.0 \ No newline at end of file +gymnasium>=0.28.1 +hydra-core>=1.3.2 \ No newline at end of file diff --git a/rtbgym/README.md b/rtbgym/README.md index 75c25ec5..6889aa56 100644 --- a/rtbgym/README.md +++ b/rtbgym/README.md @@ -21,7 +21,7 @@ *RTBGym* is an open-source simulation platform for Real-Time Bidding (RTB) of Display Advertising, which is written in Python. The simulator is particularly intended for reinforcement learning algorithms and follows [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. We design RTBGym as a configurative environment so that researchers and practitioner can customize the environmental modules including `WinningPriceDistribution`, `ClickThroughRate`, and `ConversionRate`. -Note that, RTBGym is publicized under [ofrl](../) repository, which facilitates the implementation of offline reinforcement learning procedure. +Note that, RTBGym is publicized under [SCOPE-RL](../) repository, which facilitates the implementation of offline reinforcement learning procedure. ### Basic Setting @@ -65,22 +65,22 @@ Note that, users can customize the above modules by following the [abstract clas We also define the bidding function in the [Bidder](./envs/simulator/bidder.py#15) class and the auction simulation in the [Simulator](./envs/simulator/rtb_synthetic.py#23) class, respectively. ## Installation -RTBGym can be installed as a part of [ofrl](../) using Python's package manager `pip`. +RTBGym can be installed as a part of [SCOPE-RL](../) using Python's package manager `pip`. ``` -pip install ofrl +pip install scope-rl ``` You can also install from source. ```bash -git clone https://github.com/negocia-inc/ofrl -cd ofrl +git clone https://github.com/negocia-inc/scope-rl +cd scope-rl python setup.py install ``` ## Usage We provide an example usage of the standard and customized environment. \ -The online/offlline RL and Off-Policy Evaluation examples are provides in [OFRL's README](../README.md). +The online/offlline RL and Off-Policy Evaluation examples are provides in [SCOPE-RL's README](../README.md). ### Standard RTBEnv @@ -163,7 +163,7 @@ plt.show()

-Note that, while we use [ofrl](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, RTBGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. +Note that, while we use [SCOPE-RL](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, RTBGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. ### Customized RTGEnv diff --git a/rtbgym/envs/rtb.py b/rtbgym/envs/rtb.py index a6e3fe49..f62c1fe7 100644 --- a/rtbgym/envs/rtb.py +++ b/rtbgym/envs/rtb.py @@ -144,8 +144,8 @@ class RTBEnv(gym.Env): # import necessary module from rtbgym from rtbgym import RTBEnv - from ofrl.policy import OnlineHead - from ofrl.ope.online import calc_on_policy_policy_value + from scope_rl.policy import OnlineHead + from scope_rl.ope.online import calc_on_policy_policy_value # import necessary module from other libraries from d3rlpy.algos import RandomPolicy diff --git a/setup.py b/setup.py index 3e0ecf4a..acfdd769 100644 --- a/setup.py +++ b/setup.py @@ -1,7 +1,58 @@ -from setuptools import setup +from setuptools import setup, find_packages +from os import path +import sys +from ofrl.version import __version__ -# setup OFRL -setup(name="ofrl", version="0.0.0", install_requires=["gym"]) -# setup RTBEnv -setup(name="rtbgym", version="0.0.0", install_requires=["gym"]) + +here = path.abspath(path.dirname(__file__)) +sys.path.insert(0, path.join(here, "scope_rl")) + +with open(path.join(here, "README.md"), encoding="utf-8") as f: + long_description = f.read() + + +# setup SCOPE-RL +setup( + name="scope-rl", + version=__version__, + description="SCOPE-RL: A pipeline for offline reinforcement learning research and applications", + url="https://github.com/negocia-inc/scope-rl", # [TODO] + author="Haruka Kiyohara", + author_email="scope-rl@googlegroups.com", + keywords=["off-policy evaluation", "offline reinforcement learning", "risk assessment"], + long_description=long_description, + long_description_content_type="text/markdown", + install_requires=[ + "scipy>=1.10.1", + "numpy==1.22.4", # [TODO] + "pandas>=1.5.3", + "scikit-learn>=1.0.2", + "matplotlib>=3.7.1", + "seaborn==0.11.2", # [TODO] + "torch>=2.0.0", + "d3rlpy>=1.1.1", + "gym>=0.26.2", + "gymnasium>=0.28.1", + "hydra-core>=1.3.2", + ], + license="Apache License", + packages=find_packages( + exclude=[".github", "docs", "examples", "images", "tests"], + ), + classifiers=[ + "Intended Audience :: Science/Research", + "Programming Language :: Python :: 3.7", + "Programming Language :: Python :: 3.8", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Topic :: Scientific/Engineering", + "Topic :: Scientific/Engineering :: Mathematics", + "Topic :: Scientific/Engineering :: Artificial Intelligence", + "Topic :: Software Development", + "Topic :: Software Development :: Libraries", + "Topic :: Software Development :: Libraries :: Python Modules", + "License :: OSI Approved :: Apache Software License", + ], +) diff --git a/tests/offline_gym/dataset/test_synthetic.py b/tests/scope_rl/dataset/test_synthetic.py similarity index 100% rename from tests/offline_gym/dataset/test_synthetic.py rename to tests/scope_rl/dataset/test_synthetic.py diff --git a/tests/offline_gym/ope/test_estimators_continuous.py b/tests/scope_rl/ope/test_estimators_continuous.py similarity index 100% rename from tests/offline_gym/ope/test_estimators_continuous.py rename to tests/scope_rl/ope/test_estimators_continuous.py diff --git a/tests/offline_gym/ope/test_estimators_discrete.py b/tests/scope_rl/ope/test_estimators_discrete.py similarity index 100% rename from tests/offline_gym/ope/test_estimators_discrete.py rename to tests/scope_rl/ope/test_estimators_discrete.py diff --git a/tests/offline_gym/ope/test_online.py b/tests/scope_rl/ope/test_online.py similarity index 100% rename from tests/offline_gym/ope/test_online.py rename to tests/scope_rl/ope/test_online.py diff --git a/tests/offline_gym/ope/test_ope.py b/tests/scope_rl/ope/test_ope.py similarity index 100% rename from tests/offline_gym/ope/test_ope.py rename to tests/scope_rl/ope/test_ope.py diff --git a/tests/offline_gym/policy/test_head.py b/tests/scope_rl/policy/test_head.py similarity index 100% rename from tests/offline_gym/policy/test_head.py rename to tests/scope_rl/policy/test_head.py diff --git a/tests/offline_gym/test_utils.py b/tests/scope_rl/test_utils.py similarity index 100% rename from tests/offline_gym/test_utils.py rename to tests/scope_rl/test_utils.py