Name	Name	Last commit message	Last commit date
parent directory ..
envs	envs
images	images
README.md	README.md
README_ja.md	README_ja.md
__init__.py	__init__.py
types.py	types.py
utils.py	utils.py
version.py	version.py

BasicGym: A basic, configurative reinforcement learning environment

Table of Contents (click to expand)

BasicGym: A basic reinforcement learning environment
Overview
Installation
Usage
Citation
Contribution
License
Project Team
Contact
Reference

日本語はこちら

Overview

BasicGym is an open-source simulation platform for synthetic simulation, which is written in Python. The simulator is particularly intended for reinforcement learning algorithms and follows OpenAI Gym and Gymnasium-like interface. We design BasicGym as a configurative environment so that researchers and practitioners can customize the environmental modules including StateTransitionFunction and RewardFunction

Note that BasicGym is publicized under scope-rl repository, which facilitates the implementation of the offline reinforcement learning procedure.

Basic Setting

We formulate the following (Partially Observable) Markov Decision Process ((PO)MDP):

state:
- State observation, which may be noisy in POMDPs.
action:
- Indicating the action presented by the RL agent.
reward:
- Observed immediate rewards.

Implementation

BasicGym provides a standardized environment in both discrete and continuous action settings.

"BasicEnv-continuous-v0": Standard continuous environment.
"BasicEnv-discrete-v0": Standard discrete environment.

BasicGym consists of the following environment.

BasicEnv: The basic configurative environment.

BasicGym is configurative about the following module.

StateTransitionFunction: Class to define the state transition function.
RewardFunction: Class to define the reward function.

Note that users can customize the above modules by following the abstract class.

Installation

BasicGym can be installed as a part of scope-rl using Python's package manager pip.

pip install scope-rl

You can also install it from the source.

git clone https://github.com/hakuhodo-technologies/scope-rl
cd scope-rl
python setup.py install

Usage

We provide an example usage of the standard and customized environment.
The online/offline RL and Off-Policy Evaluation examples are provided in SCOPE-RL's README.

Standard BasicEnv

Our standard BasicEnv is available from gym.make(), following the OpenAI Gym and Gymnasium-like interface.

# import BasicGym and gym
import basicgym
import gym

# (1) standard environment
env = gym.make('BasicEnv-continuous-v0')

The basic interaction is performed using only four lines of code as follows.

obs, info = env.reset()
while not done:
    action = agent.sample_action_online(obs)
    obs, reward, done, truncated, info = env.step(action)

Let's visualize the case with the uniform random policy.

# import from other libraries
from scope_rl.policy import OnlineHead
from d3rlpy.algos import RandomPolicy as ContinuousRandomPolicy

# define a random agent
agent = OnlineHead(
    ContinuousRandomPolicy(
        action_scaler=MinMaxActionScaler(
            minimum=0.1,  # minimum value that policy can take
            maximum=10,  # maximum value that policy can take
        )
    ),
    name="random",
)
agent.build_with_env(env)

# (2) basic interaction
obs, info = env.reset()
done = False
# logs
reward_list = []

while not done:
    action = agent.sample_action_online(obs)
    obs, reward, done, truncated, info = env.step(action)
    # logs
    reward_list.append(reward)


# visualize the result
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot(reward_list[:-1], label='reward', color='tab:orange')
ax1.set_xlabel('timestep')
ax1.set_ylabel('reward')
ax1.legend(loc='upper left')
plt.show()

Reward Observed during a Single Episode

Note that while we use SCOPE-RL and d3rlpy here, BasicGym is compatible with any other libraries working on the OpenAI Gym and Gymnasium-like interface.

Customized BasicEnv

Next, we describe how to customize the environment by instantiating the environment.

List of environmental configurations: (click to expand)

step_per_episode: Number of timesteps in an episode.
state_dim: Dimension of the state.
action_type: Type of the action space.
n_actions: Number of actions in the discrete action case.
action_dim: Dimension of the action (context).
action_context: Feature vectors that characterize each action. Applicable only when action_type is "discrete".
reward_type: Reward type.
reward_std: Noise level of the reward. Applicable only when reward_type is "continuous".
obs_std: Noise level of the state observation.
StateTransitionFunction: State transition function.
RewardFunction: Expected immediate reward function
random_state : Random state.

from basicgym import BasicEnv
env = BasicEnv(
    state_dim=10,
    action_type="continuous",  # "discrete"
    action_dim=5,
    reward_type="continuous",  # "ninary"
    reward_std=0.3,
    obs_std=0.3,
    step_per_episode=10,
    random_state=12345,
)

Specifically, users can define their own StateTransitionFunction and RewardFunction as follows.

Example of Custom State Transition Function

# import basicgym modules
from basicgym import BaseStateTransitionFunction
# import other necessary stuffs
from dataclasses import dataclass
from typing import Optional
import numpy as np

@dataclass
class CustomizedStateTransitionFunction(BaseStateTransitionFunction):
    state_dim: int
    action_dim: int
    random_state: Optional[int] = None

    def __post_init__(self):
        self.random_ = check_random_state(self.random_state)
        self.state_coef = self.random_.normal(loc=0.0, scale=1.0, size=(self.state_dim, self.state_dim))
        self.action_coef = self.random_.normal(loc=0.0, scale=1.0, size=(self.state_dim, self.action_dim))

    def step(
        self,
        state: np.ndarray,
        action: np.ndarray,
    ) -> np.ndarray:
        state = self.state_coef @ state / self.state_dim +  self.action_coef @ action / self.action_dim
        state = state / np.linalg.norm(state, ord=2)
        return state

Example of RewardFunction

# import basicgym modules
from basicgym import BaseRewardFunction
# import other necessary stuffs
from dataclasses import dataclass
from typing import Optional
import numpy as np

@dataclass
class CustomizedRewardFunction(BaseRewardFunction):
    state_dim: int
    action_dim: int
    reward_type: str = "continuous"  # "binary"
    reward_std: float = 0.0
    random_state: Optional[int] = None

    def __post_init__(self):
        self.random_ = check_random_state(self.random_state)
        self.state_coef = self.random_.normal(loc=0.0, scale=1.0, size=(self.state_dim, ))
        self.action_coef = self.random_.normal(loc=0.0, scale=1.0, size=(self.action_dim, ))

    def mean_reward_function(
        self,
        state: np.ndarray,
        action: np.ndarray,
    ) -> float:
        reward = self.state_coef.T @ state / self.state_dim + self.action_coef.T @ action / self.action_dim
        return reward

More examples are available at quickstart/basic/basic_synthetic_customize_env.ipynb.

Citation

If you use our software in your work, please cite our paper:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

Bibtex:

@article{kiyohara2023scope,
  author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
  title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
  journal={arXiv preprint arXiv:2311.18206},
  year = {2023},
}

Contribution

Any contributions to BasicGym are more than welcome! Please refer to CONTRIBUTING.md for general guidelines on how to contribute the project.

License

This project is licensed under Apache 2.0 license - see LICENSE file for details.

Project Team

Haruka Kiyohara (Main Contributor; Cornell University)
Ren Kishimoto (Tokyo Institute of Technology)
Kosuke Kawakami (HAKUHODO Technologies Inc.)
Ken Kobayashi (Tokyo Institute of Technology)
Kazuhide Nakata (Tokyo Institute of Technology)
Yuta Saito (Cornell University)

Contact

For any questions about the paper and software, feel free to contact: [email protected]

References

Papers (click to expand)

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
Takuma Seno and Michita Imai. d3rlpy: An Offline Deep Reinforcement Library, arXiv preprint arXiv:2111.03788, 2021.

Projects (click to expand)

This project is inspired by the following package.

Open Bandit Pipeline -- a pipeline implementation of OPE in contextual bandits: [github] [documentation] [paper]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basicgym

basicgym

README.md

BasicGym: A basic, configurative reinforcement learning environment

Overview

Basic Setting

Implementation

Installation

Usage

Standard BasicEnv

Customized BasicEnv

Example of Custom State Transition Function

Example of RewardFunction

Citation

Contribution

License

Project Team

Contact

References

Files

basicgym

Directory actions

More options

Directory actions

More options

Latest commit

History

basicgym

Folders and files

parent directory

README.md

BasicGym: A basic, configurative reinforcement learning environment

Overview

Basic Setting

Implementation

Installation

Usage

Standard BasicEnv

Customized BasicEnv

Example of Custom State Transition Function

Example of RewardFunction

Citation

Contribution

License

Project Team

Contact

References