Skip to content

Commit

Permalink
Merge pull request #21 from hakuhodo-technologies/quickstart_ja
Browse files Browse the repository at this point in the history
Add quickstart_ja and README_ja
  • Loading branch information
aiueola authored Nov 20, 2023
2 parents d80e746 + 06d57f3 commit 5d9864c
Show file tree
Hide file tree
Showing 66 changed files with 59,768 additions and 4,846 deletions.
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@

**Stable versions are available at [PyPI](https://pypi.org/project/scope-rl/)**

**日本語は[こちら](README_ja.md)**

## Overview

*SCOPE-RL* is an open-source Python Software for implementing the end-to-end procedure regarding **offline Reinforcement Learning (offline RL)**, from data collection to offline policy learning, off-policy performance evaluation, and policy selection. Our software includes a series of modules to implement synthetic dataset generation, dataset preprocessing, estimators for Off-Policy Evaluation (OPE), and Off-Policy Selection (OPS) methods.
Expand All @@ -55,9 +57,9 @@ This software is inspired by [Open Bandit Pipeline](https://github.com/st-tech/z
### Implementations

*SCOPE-RL* mainly consists of the following three modules.
- [**dataset module**](./_gym/dataset): This module provides tools to generate synthetic data from any environment on top of [OpenAI Gym](http://gym.openai.com/) and [Gymnasium](https://gymnasium.farama.org/)-like interface. It also provides tools to pre-process the logged data.
- [**policy module**](./_gym/policy): This module provides a wrapper class for [d3rlpy](https://github.com/takuseno/d3rlpy) to enable flexible data collection.
- [**ope module**](./_gym/ope): This module provides a generic abstract class to implement OPE estimators. It also provides some tools useful for performing OPS.
- [**dataset module**](./scope_rl/dataset): This module provides tools to generate synthetic data from any environment on top of [OpenAI Gym](http://gym.openai.com/) and [Gymnasium](https://gymnasium.farama.org/)-like interface. It also provides tools to pre-process the logged data.
- [**policy module**](./scope_rl/policy/): This module provides a wrapper class for [d3rlpy](https://github.com/takuseno/d3rlpy) to enable flexible data collection.
- [**ope module**](./scope_rl//ope): This module provides a generic abstract class to implement OPE estimators. It also provides some tools useful for performing OPS.

<details>
<summary><strong>Behavior Policy </strong>(click to expand)</summary>
Expand Down Expand Up @@ -423,14 +425,14 @@ For more examples, please refer to [quickstart/rtb/rtb_synthetic_discrete_advanc
If you use our software in your work, please cite our paper:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**SCOPE-RL: A Python Library for Offline Reinforcement Learning, Off-Policy Evaluation, and Policy Selection**<br>
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)

Bibtex:
```
@article{kiyohara2023towards,
@article{kiyohara2023scope,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning, Off-Policy Evaluation, and Policy Selection},
title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
year={2023},
}
Expand All @@ -439,14 +441,14 @@ Bibtex:
If you use our proposed metric "SharpeRatio@k" in your work, please cite our paper:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation in Reinforcement Learning**<br>
**Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)

Bibtex:
```
@article{kiyohara2023towards,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation in Reinforcement Learning},
title = {Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
year={2023},
}
Expand Down
586 changes: 586 additions & 0 deletions README_ja.md

Large diffs are not rendered by default.

32 changes: 17 additions & 15 deletions basicgym/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@

</details>

**日本語は[こちら](README_ja.md)**

## Overview

*BasicGym* is an open-source simulation platform for synthetic simulation, which is written in Python. The simulator is particularly intended for reinforcement learning algorithms and follows [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. We design SyntheticGym as a configurative environment so that researchers and practitioners can customize the environmental modules including `StateTransitionFunction` and `RewardFunction`
*BasicGym* is an open-source simulation platform for synthetic simulation, which is written in Python. The simulator is particularly intended for reinforcement learning algorithms and follows [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface. We design BasicGym as a configurative environment so that researchers and practitioners can customize the environmental modules including `StateTransitionFunction` and `RewardFunction`

Note that SyntheticGym is publicized under [scope-rl](../) repository, which facilitates the implementation of the offline reinforcement learning procedure.
Note that BasicGym is publicized under [scope-rl](../) repository, which facilitates the implementation of the offline reinforcement learning procedure.

### Basic Setting

Expand All @@ -33,21 +35,21 @@ We formulate the following (Partially Observable) Markov Decision Process ((PO)M

### Implementation

SyntheticGym provides a standardized environment in both discrete and continuous action settings.
BasicGym provides a standardized environment in both discrete and continuous action settings.
- `"BasicEnv-continuous-v0"`: Standard continuous environment.
- `"BasicEnv-discrete-v0"`: Standard discrete environment.

SyntheticGym consists of the following environment.
BasicGym consists of the following environment.
- [BasicEnv](./envs/basic.py#L18): The basic configurative environment.

SyntheticGym is configurative about the following module.
BasicGym is configurative about the following module.
- [StateTransitionFunction](./envs/simulator/function.py#L14): Class to define the state transition function.
- [RewardFunction](./envs/simulator/function.py#L101): Class to define the reward function.

Note that users can customize the above modules by following the [abstract class](./envs/simulator/base.py).

## Installation
SyntheticGym can be installed as a part of [scope-rl](../) using Python's package manager `pip`.
BasicGym can be installed as a part of [scope-rl](../) using Python's package manager `pip`.
```
pip install scope-rl
```
Expand All @@ -64,12 +66,12 @@ python setup.py install
We provide an example usage of the standard and customized environment. \
The online/offline RL and Off-Policy Evaluation examples are provided in [SCOPE-RL's README](../README.md).

### Standard SyntheticEnv
### Standard BasicEnv

Our standard SyntheticEnv is available from `gym.make()`, following the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface.
Our standard BasicEnv is available from `gym.make()`, following the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface.

```Python
# import SyntheticGym and gym
# import BasicGym and gym
import basicgym
import gym

Expand Down Expand Up @@ -134,9 +136,9 @@ plt.show()
</p>
</figcaption>

Note that while we use [SCOPE-RL](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, SyntheticGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface.
Note that while we use [SCOPE-RL](../README.md) and [d3rlpy](https://github.com/takuseno/d3rlpy) here, BasicGym is compatible with any other libraries working on the [OpenAI Gym](https://gym.openai.com) and [Gymnasium](https://gymnasium.farama.org/)-like interface.

### Customized SyntheticEnv
### Customized BasicEnv

Next, we describe how to customize the environment by instantiating the environment.

Expand Down Expand Up @@ -242,22 +244,22 @@ More examples are available at [quickstart/basic/basic_synthetic_customize_env.i
If you use our software in your work, please cite our paper:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.<br>
**SCOPE-RL: A Python Library for Offline Reinforcement Learning, Off-Policy Evaluation, and Policy Selection**<br>
**SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation**<br>
[link]() (a preprint coming soon..)

Bibtex:
```
@article{kiyohara2023towards,
@article{kiyohara2023scope,
author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning, Off-Policy Evaluation, and Policy Selection},
title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
journal={arXiv preprint arXiv:23xx.xxxxx},
year = {2023},
}
```

## Contribution

Any contributions to SyntheticGym are more than welcome!
Any contributions to BasicGym are more than welcome!
Please refer to [CONTRIBUTING.md](../CONTRIBUTING.md) for general guidelines on how to contribute the project.

## License
Expand Down
Loading

0 comments on commit 5d9864c

Please sign in to comment.