Skip to content

Commit

Permalink
Merge pull request #23 from hakuhodo-technologies/paper
Browse files Browse the repository at this point in the history
Paper
  • Loading branch information
aiueola authored Dec 1, 2023
2 parents 5d9864c + 406a756 commit b1c9e9f
Show file tree
Hide file tree
Showing 22 changed files with 97 additions and 39 deletions.
29 changes: 24 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
[![GitHub last commit](https://img.shields.io/github/last-commit/hakuhodo-technologies/scope-rl)](https://github.com/hakuhodo-technologies/scope-rl/graphs/commit-activity)
[![Documentation Status](https://readthedocs.org/projects/scope-rl/badge/?version=latest)](https://scope-rl.readthedocs.io/en/latest/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![arXiv](https://img.shields.io/badge/arXiv-23xx.xxxxx-b31b1b.svg)](https://arxiv.org/abs/23xx.xxxxx)
[![arXiv](https://img.shields.io/badge/arXiv-2311.18206-b31b1b.svg)](https://arxiv.org/abs/2311.18206)
[![arXiv](https://img.shields.io/badge/arXiv-2311.18207-b31b1b.svg)](https://arxiv.org/abs/2311.18207)

<details>
<summary><strong>Table of Contents </strong>(click to expand)</summary>
Expand All @@ -36,6 +37,8 @@

**Stable versions are available at [PyPI](https://pypi.org/project/scope-rl/)**

**Slides are available [here](https://speakerdeck.com/harukakiyohara_/scope-rl)**

**日本語は[こちら](README_ja.md)**

## Overview
Expand All @@ -56,6 +59,13 @@ This software is inspired by [Open Bandit Pipeline](https://github.com/st-tech/z

### Implementations

<div align="center"><img src="https://raw.githubusercontent.com/hakuhodo-technologies/scope-rl/main/images/scope_workflow.png" width="100%"/></div>
<figcaption>
<p align="center">
End-to-end workflow of offline RL and OPE with SCOPE-RL
</p>
</figcaption>

*SCOPE-RL* mainly consists of the following three modules.
- [**dataset module**](./scope_rl/dataset): This module provides tools to generate synthetic data from any environment on top of [OpenAI Gym](http://gym.openai.com/) and [Gymnasium](https://gymnasium.farama.org/)-like interface. It also provides tools to pre-process the logged data.
- [**policy module**](./scope_rl/policy/): This module provides a wrapper class for [d3rlpy](https://github.com/takuseno/d3rlpy) to enable flexible data collection.
Expand Down Expand Up @@ -130,6 +140,15 @@ This software is inspired by [Open Bandit Pipeline](https://github.com/st-tech/z

Note that in addition to the above OPE and OPS methods, researchers can easily implement and compare their own estimators through a generic abstract class implemented in SCOPE-RL. Moreover, practitioners can apply the above methods to their real-world data to evaluate and choose counterfactual policies for their own practical situations.

The distinctive features of OPE/OPS modules of SCOPE-RL are summarized as follows.

<div align="center"><img src="https://raw.githubusercontent.com/hakuhodo-technologies/scope-rl/main/images/ope_features.png" width="100%"/></div>
<figcaption>
<p align="center">
Four distinctive features of OPE/OPS implementation of SCOPE-RL
</p>
</figcaption>

To provide an example of performing a customized experiment imitating a practical setup, we also provide [RTBGym](./rtbgym) and [RecGym](./recgym), RL environments for Real-Time Bidding (RTB) and Recommender Systems.

## Installation
Expand Down Expand Up @@ -426,14 +445,14 @@ If you use our software in your work, please cite our paper:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)
[[arXiv](https://arxiv.org/abs/2311.18206)] [[slides](https://speakerdeck.com/harukakiyohara_/scope-rl)]

Bibtex:
```
@article{kiyohara2023scope,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18206},
year={2023},
}
```
Expand All @@ -442,14 +461,14 @@ If you use our proposed metric "SharpeRatio@k" in your work, please cite our pap

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)
[[arXiv](https://arxiv.org/abs/2311.18207)] [[slides](https://speakerdeck.com/harukakiyohara_/towards-risk-return-assessment-of-ope)]

Bibtex:
```
@article{kiyohara2023towards,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18207},
year={2023},
}
```
Expand Down
30 changes: 25 additions & 5 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
[![GitHub last commit](https://img.shields.io/github/last-commit/hakuhodo-technologies/scope-rl)](https://github.com/hakuhodo-technologies/scope-rl/graphs/commit-activity)
[![Documentation Status](https://readthedocs.org/projects/scope-rl/badge/?version=latest)](https://scope-rl.readthedocs.io/en/latest/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![arXiv](https://img.shields.io/badge/arXiv-23xx.xxxxx-b31b1b.svg)](https://arxiv.org/abs/23xx.xxxxx)
[[![arXiv](https://img.shields.io/badge/arXiv-2311.18206-b31b1b.svg)](https://arxiv.org/abs/2311.18206)
[![arXiv](https://img.shields.io/badge/arXiv-2311.18207-b31b1b.svg)](https://arxiv.org/abs/2311.18207)

<details>
<summary><strong>目次</strong>(クリックして展開)</summary>
Expand All @@ -36,6 +37,9 @@

**PyPIで最新版が利用可能 [PyPI](https://pypi.org/project/scope-rl/)**

**解説スライドは [こちら](https://speakerdeck.com/aiueola/scope-rl-ja)**


## 概要
SCOPE-RL は,データ収集からオフ方策学習,オフ方策性能評価,方策選択をend-to-endで実装するためのオープンソースのPythonソフトウェアです.私たちのソフトウェアには,人工データ生成,データの前処理,オフ方策評価 (off-policy evaluation; OPE) の推定量,オフ方策選択 (off-policy selection; OPS) 手法を実装するための一連のモジュールが含まれています.

Expand All @@ -55,6 +59,13 @@ SCOPE-RL は,データ収集からオフ方策学習,オフ方策性能評

### 実装

<div align="center"><img src="https://raw.githubusercontent.com/hakuhodo-technologies/scope-rl/main/images/scope_workflow.png" width="100%"/></div>
<figcaption>
<p align="center">
SCOPE-RL上で行えるオフライン強化学習とオフ方策評価の一貫した実装手順
</p>
</figcaption>

*SCOPE-RL* は主に以下の3つのモジュールから構成されています.

- [**dataset module**](./_gym/dataset): このモジュールは,[OpenAI Gym](http://gym.openai.com/)[Gymnasium](https://gymnasium.farama.org/)のようなインターフェイスに基づく任意の環境から人工データを生成するためのツールを提供します.また,ログデータの前処理を行うためのツールも提供します.
Expand Down Expand Up @@ -130,6 +141,15 @@ SCOPE-RL は,データ収集からオフ方策学習,オフ方策性能評

研究上でのSCOPE-RLの利点は,抽象クラスを用いることで既に実装されているオフ方策評価およびオフ方策選択手法に加えて,研究者が自分の推定量を簡単に実装し,比較することができることです.さらに実践上では,様々なオフ方策推定量を実データに適用し,自身の実際の状況に合った方策を評価し,選択することができることです.

さらにSCOPE-RLでは既存のパッケージの機能に留まらず,以下の図の通りより実用に即したオフ方策評価の実装が可能です.

<div align="center"><img src="https://raw.githubusercontent.com/hakuhodo-technologies/scope-rl/main/images/ope_features_ja.png" width="100%"/></div>
<figcaption>
<p align="center">
SCOPE-RLのオフ方策評価モジュールが力を入れる4つの機能
</p>
</figcaption>

またSCOPE-RLはサブパッケージとして、シンプルな設定の[BasicGym](./basicgym)実用的な環境をシミュレーションした広告入札 (real-time bidding; RTB) と推薦システム用の強化学習環境である[RTBGym](./rtbgym)[RecGym](./recgym)も提供しています。


Expand Down Expand Up @@ -434,14 +454,14 @@ ops.visualize_conditional_value_at_risk_for_validation(

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)
[[arXiv](https://arxiv.org/abs/2311.18206)] [[日本語スライド](https://speakerdeck.com/aiueola/scope-rl-ja)]

Bibtex:
```
@article{kiyohara2023scope,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18206},
year={2023},
}
```
Expand All @@ -450,14 +470,14 @@ Bibtex:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)
[[arXiv](https://arxiv.org/abs/2311.18207)] [[日本語スライド](https://speakerdeck.com/aiueola/towards-risk-return-assessment-of-ope-ja)]

Bibtex:
```
@article{kiyohara2023towards,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18207},
year={2023},
}
```
Expand Down
3 changes: 1 addition & 2 deletions basicgym/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,14 +245,13 @@ If you use our software in your work, please cite our paper:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)

Bibtex:
```
@article{kiyohara2023scope,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18206},
year = {2023},
}
```
Expand Down
3 changes: 1 addition & 2 deletions basicgym/README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,14 +246,13 @@ class CustomizedRewardFunction(BaseRewardFunction):

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)

Bibtex:
```
@article{kiyohara2023scope,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18206},
year = {2023},
}
```
Expand Down
Binary file added docs/_static/images/ope_features.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/images/scope_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@
"icon_links": [
{
"name": "Speaker Deck",
"url": "https://speakerdeck.com/aiueola/ofrl-designing-an-offline-reinforcement-learning-and-policy-evaluation-platform-from-practical-perspectives",
"url": "https://speakerdeck.com/harukakiyohara_/scope-rl",
"icon": "fa-brands fa-speaker-deck",
"type": "fontawesome",
},
Expand Down
12 changes: 12 additions & 0 deletions docs/documentation/distinctive_features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ advanced ones :cite:`kallus2020double, uehara2020minimax, liu2018breaking, yang2
Moreover, we provide the meta-class to handle OPE/OPS experiments and the abstract base implementation of OPE estimators.
This allows researchers to quickly test their own algorithms with this platform and also helps practitioners empirically learn the property of various OPE methods.

.. card::
:width: 75%
:margin: auto
:img-top: ../_static/images/ope_features.png
:text-align: center

Four key features of OPE/OPS modules of SCOPE-RL

.. raw:: html

<div class="white-space-20px"></div>

.. _feature_variety_ope:

Variety of OPE estimators and evaluation protocol of OPE
Expand Down
4 changes: 4 additions & 0 deletions docs/documentation/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ Example Codes
SCOPE-RL
----------

.. seealso::

Please also refer to :doc:`SCOPE-RL package reference </documentation/scope_rl_api>` for APIs.

.. _basic_ope_example:

Basic and High-Confidence Off-Policy Evaluation (OPE):
Expand Down
5 changes: 2 additions & 3 deletions docs/documentation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -215,14 +215,13 @@ If you use our pipeline in your work, please cite our paper below.

| Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
| **SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**
| (a preprint is coming soon..)
.. code-block::
@article{kiyohara2023scope,
title={SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18206},
year={2023}
}
Expand All @@ -239,7 +238,7 @@ If you use the proposed metric (SharpeRatio@k) or refer to our findings in your
@article{kiyohara2023towards,
title={Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation},
author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18207},
year={2023}
}
Expand Down
5 changes: 2 additions & 3 deletions docs/documentation/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ If you use our pipeline in your work, please cite our paper below.
@article{kiyohara2023scope,
title={SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18206},
year={2023}
}
Expand All @@ -50,14 +50,13 @@ If you use the proposed metric (SharpeRatio@k) or refer to our findings in your

| Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
| **Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation**
| (a preprint is coming soon..)
.. code-block::
@article{kiyohara2023towards,
title={Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation},
author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18207},
year={2023}
}
Expand Down
4 changes: 2 additions & 2 deletions docs/documentation/news.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ Follow us on `Google Group ([email protected]) <https://groups.google.co
2023
~~~~~~~~~~

**2023.11.xx** Preprints of our papers: (1) [SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation]() ([slides](), [日本語スライド]()),
and (2) [Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation]() ([slides](), [日本語スライド]()) are now available at arXiv!
**2023.12.01** Preprints of our twin papers: (1) `SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation <https://arxiv.org/abs/2311.18206>`_ (`slides <https://speakerdeck.com/harukakiyohara_/scope-rl>`_, `日本語スライド <https://speakerdeck.com/aiueola/scope-rl-ja>`_),
and (2) `Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation <https://arxiv.org/abs/2311.18207>`_ (`slides <https://speakerdeck.com/harukakiyohara_/towards-risk-return-assessment-of-ope>`_, `日本語スライド <https://speakerdeck.com/aiueola/towards-risk-return-assessment-of-ope-ja>`_) are now available at arXiv!

**2023.7.30** Released :class:`v0.2.1` of SCOPE-RL! This release upgrades the version of d3rlpy from `1.1.1` to `2.0.4`.

Expand Down
14 changes: 13 additions & 1 deletion docs/documentation/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,19 @@ Quickstart
==========

We show an example workflow of synthetic dataset collection, offline Reinforcement Learning (RL), to Off-Policy Evaluation (OPE).
The workflow mainly consists of the following three steps:
The workflow mainly consists of the following three steps (and a validation step):

.. card::
:width: 75%
:margin: auto
:img-top: ../_static/images/scope_workflow.png
:text-align: center

Workflow of offline RL and OPE streamlined by SCOPE-RL

.. raw:: html

<div class="white-space-20px"></div>

* **Synthetic Dataset Generation and Data Preprocessing**:
The initial step is to collect logged data using a behavior policy. In a synthetic setup, we first train a behavior policy through online interaction and then generate dataset(s) with the behavior policy. In a practical situation, we should use the preprocessed logged data obtained from real-world applications.
Expand Down
8 changes: 3 additions & 5 deletions docs/documentation/sharpe_ratio.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,8 @@ Note that for the basic problem formulation of Off-Policy Evaluation and Selecti
.. seealso::

The **SharpeRatio@k** metric is the main contribution of our paper **"Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation."**
Our paper is currently under submission, and the arXiv version of the paper will come soon..
Our paper is currently under submission, and the arXiv version of the paper is available `here <https://arxiv.org/abs/2311.18207>`_.

.. A preprint is available at `arXiv <>`_.

Background
~~~~~~~~~~
Expand Down Expand Up @@ -281,7 +280,7 @@ The above figure reports the benchmark results of OPE estimators with SharpeRati
1. Future research in OPE should include the assessment of estimators based on SharpeRatio@k:

The findings from the previous section suggest that SharpeRatio@k provides more actionable insights compared to traditional accuracy metrics.
The benchmark results using SharpeRatio@k, as shown in Figure~\ref{fig:sharpe_ratio_benchmark}, often significantly differ from those obtained with conventional accuracy metrics.
The benchmark results using SharpeRatio@k (particularly as shown in the figures of Result 2), often significantly differ from those obtained with conventional accuracy metrics.
This highlights the importance of integrating SharpeRatio@k into future research to more effectively evaluate the efficiency of OPE estimators.

2. A new estimator that explicitly optimizes the risk-return tradeoff:
Expand Down Expand Up @@ -312,14 +311,13 @@ If you use the proposed metric (SharpeRatio@k) or refer to our findings in your

| Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
| **Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation**
| (a preprint is coming soon..)
.. code-block::
@article{kiyohara2023towards,
title={Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation},
author={Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nakata, Kazuhide and Saito, Yuta},
journal={arXiv preprint arXiv:23xx.xxxxx},
journal={arXiv preprint arXiv:2311.18207},
year={2023}
}
Expand Down
Loading

0 comments on commit b1c9e9f

Please sign in to comment.