diff --git a/README.md b/README.md index aaaec8e..2e7b074 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,8 @@ [![GitHub last commit](https://img.shields.io/github/last-commit/hakuhodo-technologies/scope-rl)](https://github.com/hakuhodo-technologies/scope-rl/graphs/commit-activity) [![Documentation Status](https://readthedocs.org/projects/scope-rl/badge/?version=latest)](https://scope-rl.readthedocs.io/en/latest/) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) -[![arXiv](https://img.shields.io/badge/arXiv-23xx.xxxxx-b31b1b.svg)](https://arxiv.org/abs/23xx.xxxxx) +[![arXiv](https://img.shields.io/badge/arXiv-2311.18206-b31b1b.svg)](https://arxiv.org/abs/2311.18206) +[![arXiv](https://img.shields.io/badge/arXiv-2311.18207-b31b1b.svg)](https://arxiv.org/abs/2311.18207)
Table of Contents (click to expand) @@ -36,6 +37,8 @@ **Stable versions are available at [PyPI](https://pypi.org/project/scope-rl/)** +**Slides are available [here](https://speakerdeck.com/harukakiyohara_/scope-rl)** + **日本語は[こちら](README_ja.md)** ## Overview @@ -56,6 +59,13 @@ This software is inspired by [Open Bandit Pipeline](https://github.com/st-tech/z ### Implementations +
+
+

+ End-to-end workflow of offline RL and OPE with SCOPE-RL +

+
+ *SCOPE-RL* mainly consists of the following three modules. - [**dataset module**](./scope_rl/dataset): This module provides tools to generate synthetic data from any environment on top of [OpenAI Gym](http://gym.openai.com/) and [Gymnasium](https://gymnasium.farama.org/)-like interface. It also provides tools to pre-process the logged data. - [**policy module**](./scope_rl/policy/): This module provides a wrapper class for [d3rlpy](https://github.com/takuseno/d3rlpy) to enable flexible data collection. @@ -130,6 +140,15 @@ This software is inspired by [Open Bandit Pipeline](https://github.com/st-tech/z Note that in addition to the above OPE and OPS methods, researchers can easily implement and compare their own estimators through a generic abstract class implemented in SCOPE-RL. Moreover, practitioners can apply the above methods to their real-world data to evaluate and choose counterfactual policies for their own practical situations. +The distinctive features of OPE/OPS modules of SCOPE-RL are summarized as follows. + +
+
+

+ Four distinctive features of OPE/OPS implementation of SCOPE-RL +

+
+ To provide an example of performing a customized experiment imitating a practical setup, we also provide [RTBGym](./rtbgym) and [RecGym](./recgym), RL environments for Real-Time Bidding (RTB) and Recommender Systems. ## Installation @@ -426,14 +445,14 @@ If you use our software in your work, please cite our paper: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
-[link]() (a preprint coming soon..) +[[arXiv](https://arxiv.org/abs/2311.18206)] [[slides](https://speakerdeck.com/harukakiyohara_/scope-rl)] Bibtex: ``` @article{kiyohara2023scope, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year={2023}, } ``` @@ -442,14 +461,14 @@ If you use our proposed metric "SharpeRatio@k" in your work, please cite our pap Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation**
-[link]() (a preprint coming soon..) +[[arXiv](https://arxiv.org/abs/2311.18207)] [[slides](https://speakerdeck.com/harukakiyohara_/towards-risk-return-assessment-of-ope)] Bibtex: ``` @article{kiyohara2023towards, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18207}, year={2023}, } ``` diff --git a/README_ja.md b/README_ja.md index 8b2d9cf..c28040e 100644 --- a/README_ja.md +++ b/README_ja.md @@ -9,7 +9,8 @@ [![GitHub last commit](https://img.shields.io/github/last-commit/hakuhodo-technologies/scope-rl)](https://github.com/hakuhodo-technologies/scope-rl/graphs/commit-activity) [![Documentation Status](https://readthedocs.org/projects/scope-rl/badge/?version=latest)](https://scope-rl.readthedocs.io/en/latest/) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) -[![arXiv](https://img.shields.io/badge/arXiv-23xx.xxxxx-b31b1b.svg)](https://arxiv.org/abs/23xx.xxxxx) +[[![arXiv](https://img.shields.io/badge/arXiv-2311.18206-b31b1b.svg)](https://arxiv.org/abs/2311.18206) +[![arXiv](https://img.shields.io/badge/arXiv-2311.18207-b31b1b.svg)](https://arxiv.org/abs/2311.18207)
目次(クリックして展開) @@ -36,6 +37,9 @@ **PyPIで最新版が利用可能 [PyPI](https://pypi.org/project/scope-rl/)** +**解説スライドは [こちら](https://speakerdeck.com/aiueola/scope-rl-ja)** + + ## 概要 SCOPE-RL は,データ収集からオフ方策学習,オフ方策性能評価,方策選択をend-to-endで実装するためのオープンソースのPythonソフトウェアです.私たちのソフトウェアには,人工データ生成,データの前処理,オフ方策評価 (off-policy evaluation; OPE) の推定量,オフ方策選択 (off-policy selection; OPS) 手法を実装するための一連のモジュールが含まれています. @@ -55,6 +59,13 @@ SCOPE-RL は,データ収集からオフ方策学習,オフ方策性能評 ### 実装 +
+
+

+ SCOPE-RL上で行えるオフライン強化学習とオフ方策評価の一貫した実装手順 +

+
+ *SCOPE-RL* は主に以下の3つのモジュールから構成されています. - [**dataset module**](./_gym/dataset): このモジュールは,[OpenAI Gym](http://gym.openai.com/) や[Gymnasium](https://gymnasium.farama.org/)のようなインターフェイスに基づく任意の環境から人工データを生成するためのツールを提供します.また,ログデータの前処理を行うためのツールも提供します. @@ -130,6 +141,15 @@ SCOPE-RL は,データ収集からオフ方策学習,オフ方策性能評 研究上でのSCOPE-RLの利点は,抽象クラスを用いることで既に実装されているオフ方策評価およびオフ方策選択手法に加えて,研究者が自分の推定量を簡単に実装し,比較することができることです.さらに実践上では,様々なオフ方策推定量を実データに適用し,自身の実際の状況に合った方策を評価し,選択することができることです. +さらにSCOPE-RLでは既存のパッケージの機能に留まらず,以下の図の通りより実用に即したオフ方策評価の実装が可能です. + +
+
+

+ SCOPE-RLのオフ方策評価モジュールが力を入れる4つの機能 +

+
+ またSCOPE-RLはサブパッケージとして、シンプルな設定の[BasicGym](./basicgym)実用的な環境をシミュレーションした広告入札 (real-time bidding; RTB) と推薦システム用の強化学習環境である[RTBGym](./rtbgym)と[RecGym](./recgym)も提供しています。 @@ -434,14 +454,14 @@ ops.visualize_conditional_value_at_risk_for_validation( Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
-[link]() (a preprint coming soon..) +[[arXiv](https://arxiv.org/abs/2311.18206)] [[日本語スライド](https://speakerdeck.com/aiueola/scope-rl-ja)] Bibtex: ``` @article{kiyohara2023scope, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year={2023}, } ``` @@ -450,14 +470,14 @@ Bibtex: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation**
-[link]() (a preprint coming soon..) +[[arXiv](https://arxiv.org/abs/2311.18207)] [[日本語スライド](https://speakerdeck.com/aiueola/towards-risk-return-assessment-of-ope-ja)] Bibtex: ``` @article{kiyohara2023towards, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18207}, year={2023}, } ``` diff --git a/basicgym/README.md b/basicgym/README.md index 736ef31..32fd17a 100644 --- a/basicgym/README.md +++ b/basicgym/README.md @@ -245,14 +245,13 @@ If you use our software in your work, please cite our paper: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
-[link]() (a preprint coming soon..) Bibtex: ``` @article{kiyohara2023scope, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year = {2023}, } ``` diff --git a/basicgym/README_ja.md b/basicgym/README_ja.md index 8ac35ac..f1b168e 100644 --- a/basicgym/README_ja.md +++ b/basicgym/README_ja.md @@ -246,14 +246,13 @@ class CustomizedRewardFunction(BaseRewardFunction): Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
-[link]() (a preprint coming soon..) Bibtex: ``` @article{kiyohara2023scope, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year = {2023}, } ``` diff --git a/docs/_static/images/ope_features.png b/docs/_static/images/ope_features.png new file mode 100644 index 0000000..7ac5653 Binary files /dev/null and b/docs/_static/images/ope_features.png differ diff --git a/docs/_static/images/scope_workflow.png b/docs/_static/images/scope_workflow.png index ff0ca66..3f8d489 100644 Binary files a/docs/_static/images/scope_workflow.png and b/docs/_static/images/scope_workflow.png differ diff --git a/docs/conf.py b/docs/conf.py index 4139df1..d370a05 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -81,7 +81,7 @@ "icon_links": [ { "name": "Speaker Deck", - "url": "https://speakerdeck.com/aiueola/ofrl-designing-an-offline-reinforcement-learning-and-policy-evaluation-platform-from-practical-perspectives", + "url": "https://speakerdeck.com/harukakiyohara_/scope-rl", "icon": "fa-brands fa-speaker-deck", "type": "fontawesome", }, diff --git a/docs/documentation/distinctive_features.rst b/docs/documentation/distinctive_features.rst index c0561d9..6855fc2 100644 --- a/docs/documentation/distinctive_features.rst +++ b/docs/documentation/distinctive_features.rst @@ -106,6 +106,18 @@ advanced ones :cite:`kallus2020double, uehara2020minimax, liu2018breaking, yang2 Moreover, we provide the meta-class to handle OPE/OPS experiments and the abstract base implementation of OPE estimators. This allows researchers to quickly test their own algorithms with this platform and also helps practitioners empirically learn the property of various OPE methods. +.. card:: + :width: 75% + :margin: auto + :img-top: ../_static/images/ope_features.png + :text-align: center + + Four key features of OPE/OPS modules of SCOPE-RL + +.. raw:: html + +
+ .. _feature_variety_ope: Variety of OPE estimators and evaluation protocol of OPE diff --git a/docs/documentation/examples/index.rst b/docs/documentation/examples/index.rst index 331d856..44bb955 100644 --- a/docs/documentation/examples/index.rst +++ b/docs/documentation/examples/index.rst @@ -6,6 +6,10 @@ Example Codes SCOPE-RL ---------- +.. seealso:: + + Please also refer to :doc:`SCOPE-RL package reference ` for APIs. + .. _basic_ope_example: Basic and High-Confidence Off-Policy Evaluation (OPE): diff --git a/docs/documentation/index.rst b/docs/documentation/index.rst index 8ad626b..e41356c 100644 --- a/docs/documentation/index.rst +++ b/docs/documentation/index.rst @@ -215,14 +215,13 @@ If you use our pipeline in your work, please cite our paper below. | Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito. | **SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation** - | (a preprint is coming soon..) .. code-block:: @article{kiyohara2023scope, title={SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year={2023} } @@ -239,7 +238,7 @@ If you use the proposed metric (SharpeRatio@k) or refer to our findings in your @article{kiyohara2023towards, title={Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation}, author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18207}, year={2023} } diff --git a/docs/documentation/installation.rst b/docs/documentation/installation.rst index a5954a5..d24b760 100644 --- a/docs/documentation/installation.rst +++ b/docs/documentation/installation.rst @@ -40,7 +40,7 @@ If you use our pipeline in your work, please cite our paper below. @article{kiyohara2023scope, title={SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year={2023} } @@ -50,14 +50,13 @@ If you use the proposed metric (SharpeRatio@k) or refer to our findings in your | Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito. | **Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation** - | (a preprint is coming soon..) .. code-block:: @article{kiyohara2023towards, title={Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation}, author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18207}, year={2023} } diff --git a/docs/documentation/news.rst b/docs/documentation/news.rst index 97950e5..a66b3b0 100644 --- a/docs/documentation/news.rst +++ b/docs/documentation/news.rst @@ -6,8 +6,8 @@ Follow us on `Google Group (scope-rl@googlegroups.com) `_ (`slides `_, `日本語スライド `_), +and (2) `Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation `_ (`slides `_, `日本語スライド `_) are now available at arXiv! **2023.7.30** Released :class:`v0.2.1` of SCOPE-RL! This release upgrades the version of d3rlpy from `1.1.1` to `2.0.4`. diff --git a/docs/documentation/quickstart.rst b/docs/documentation/quickstart.rst index d8e79ef..049c230 100644 --- a/docs/documentation/quickstart.rst +++ b/docs/documentation/quickstart.rst @@ -2,7 +2,19 @@ Quickstart ========== We show an example workflow of synthetic dataset collection, offline Reinforcement Learning (RL), to Off-Policy Evaluation (OPE). -The workflow mainly consists of the following three steps: +The workflow mainly consists of the following three steps (and a validation step): + +.. card:: + :width: 75% + :margin: auto + :img-top: ../_static/images/scope_workflow.png + :text-align: center + + Workflow of offline RL and OPE streamlined by SCOPE-RL + +.. raw:: html + +
* **Synthetic Dataset Generation and Data Preprocessing**: The initial step is to collect logged data using a behavior policy. In a synthetic setup, we first train a behavior policy through online interaction and then generate dataset(s) with the behavior policy. In a practical situation, we should use the preprocessed logged data obtained from real-world applications. diff --git a/docs/documentation/sharpe_ratio.rst b/docs/documentation/sharpe_ratio.rst index 4a8db43..259bc2a 100644 --- a/docs/documentation/sharpe_ratio.rst +++ b/docs/documentation/sharpe_ratio.rst @@ -10,9 +10,8 @@ Note that for the basic problem formulation of Off-Policy Evaluation and Selecti .. seealso:: The **SharpeRatio@k** metric is the main contribution of our paper **"Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation."** - Our paper is currently under submission, and the arXiv version of the paper will come soon.. + Our paper is currently under submission, and the arXiv version of the paper is available `here `_. - .. A preprint is available at `arXiv <>`_. Background ~~~~~~~~~~ @@ -281,7 +280,7 @@ The above figure reports the benchmark results of OPE estimators with SharpeRati 1. Future research in OPE should include the assessment of estimators based on SharpeRatio@k: The findings from the previous section suggest that SharpeRatio@k provides more actionable insights compared to traditional accuracy metrics. - The benchmark results using SharpeRatio@k, as shown in Figure~\ref{fig:sharpe_ratio_benchmark}, often significantly differ from those obtained with conventional accuracy metrics. + The benchmark results using SharpeRatio@k (particularly as shown in the figures of Result 2), often significantly differ from those obtained with conventional accuracy metrics. This highlights the importance of integrating SharpeRatio@k into future research to more effectively evaluate the efficiency of OPE estimators. 2. A new estimator that explicitly optimizes the risk-return tradeoff: @@ -312,14 +311,13 @@ If you use the proposed metric (SharpeRatio@k) or refer to our findings in your | Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito. | **Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation** - | (a preprint is coming soon..) .. code-block:: @article{kiyohara2023towards, title={Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation}, author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18207}, year={2023} } diff --git a/docs/index.rst b/docs/index.rst index 408c124..1a6b365 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -299,14 +299,13 @@ If you use our pipeline in your work, please cite our paper below. | Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito. | **SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation** - | (a preprint is coming soon..) .. code-block:: @article{kiyohara2023scope, title={SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year={2023} } @@ -342,6 +341,7 @@ Welcome! Quickstart Documentation Usage + API Sub-packages FAQs News diff --git a/images/ope_features.png b/images/ope_features.png new file mode 100644 index 0000000..7ac5653 Binary files /dev/null and b/images/ope_features.png differ diff --git a/images/ope_features_ja.png b/images/ope_features_ja.png new file mode 100644 index 0000000..c8ce06f Binary files /dev/null and b/images/ope_features_ja.png differ diff --git a/images/scope_workflow.png b/images/scope_workflow.png new file mode 100644 index 0000000..3f8d489 Binary files /dev/null and b/images/scope_workflow.png differ diff --git a/recgym/README.md b/recgym/README.md index 27fd2a2..fb97f07 100644 --- a/recgym/README.md +++ b/recgym/README.md @@ -230,14 +230,13 @@ If you use our software in your work, please cite our paper: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
-[link]() (a preprint coming soon..) Bibtex: ``` @article{kiyohara2023scope, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year = {2023}, } ``` diff --git a/recgym/README_ja.md b/recgym/README_ja.md index 1db32de..4101a8d 100644 --- a/recgym/README_ja.md +++ b/recgym/README_ja.md @@ -228,14 +228,13 @@ class CustomizedUserModel(BaseUserModel): Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
-[link]() (a preprint coming soon..) Bibtex: ``` @article{kiyohara2023scope, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year = {2023}, } ``` diff --git a/rtbgym/README.md b/rtbgym/README.md index a0ecd8b..00ded26 100644 --- a/rtbgym/README.md +++ b/rtbgym/README.md @@ -363,14 +363,13 @@ If you use our software in your work, please cite our paper: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
-[link]() (a preprint coming soon..) Bibtex: ``` @article{kiyohara2023scope, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year = {2023}, } ``` diff --git a/rtbgym/README_ja.md b/rtbgym/README_ja.md index f94b45b..ea7ddb3 100644 --- a/rtbgym/README_ja.md +++ b/rtbgym/README_ja.md @@ -359,16 +359,16 @@ custom_env = CustomizedRTBEnv( ## 引用 ソフトウェアを使用する場合は,以下の論文の引用をお願いします. + Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
-[link]() (a preprint coming soon..) Bibtex: ``` @article{kiyohara2023scope, author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta}, title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation}, - journal={arXiv preprint arXiv:23xx.xxxxx}, + journal={arXiv preprint arXiv:2311.18206}, year = {2023}, } ```