Skip to content

Latest commit





Folders and files

Last commit message
Last commit date

parent directory


BasicGym: A basic, configurative reinforcement learning environment

Table of Contents (click to expand)



BasicGym is an open-source simulation platform for synthetic simulation, which is written in Python. The simulator is particularly intended for reinforcement learning algorithms and follows OpenAI Gym and Gymnasium-like interface. We design BasicGym as a configurative environment so that researchers and practitioners can customize the environmental modules including StateTransitionFunction and RewardFunction

Note that BasicGym is publicized under scope-rl repository, which facilitates the implementation of the offline reinforcement learning procedure.

Basic Setting

We formulate the following (Partially Observable) Markov Decision Process ((PO)MDP):

  • state:
    • State observation, which may be noisy in POMDPs.
  • action:
    • Indicating the action presented by the RL agent.
  • reward:
    • Observed immediate rewards.


BasicGym provides a standardized environment in both discrete and continuous action settings.

  • "BasicEnv-continuous-v0": Standard continuous environment.
  • "BasicEnv-discrete-v0": Standard discrete environment.

BasicGym consists of the following environment.

  • BasicEnv: The basic configurative environment.

BasicGym is configurative about the following module.

Note that users can customize the above modules by following the abstract class.


BasicGym can be installed as a part of scope-rl using Python's package manager pip.

pip install scope-rl

You can also install it from the source.

git clone
cd scope-rl
python install


We provide an example usage of the standard and customized environment.
The online/offline RL and Off-Policy Evaluation examples are provided in SCOPE-RL's README.

Standard BasicEnv

Our standard BasicEnv is available from gym.make(), following the OpenAI Gym and Gymnasium-like interface.

# import BasicGym and gym
import basicgym
import gym

# (1) standard environment
env = gym.make('BasicEnv-continuous-v0')

The basic interaction is performed using only four lines of code as follows.

obs, info = env.reset()
while not done:
    action = agent.sample_action_online(obs)
    obs, reward, done, truncated, info = env.step(action)

Let's visualize the case with the uniform random policy.

# import from other libraries
from scope_rl.policy import OnlineHead
from d3rlpy.algos import RandomPolicy as ContinuousRandomPolicy

# define a random agent
agent = OnlineHead(
            minimum=0.1,  # minimum value that policy can take
            maximum=10,  # maximum value that policy can take

# (2) basic interaction
obs, info = env.reset()
done = False
# logs
reward_list = []

while not done:
    action = agent.sample_action_online(obs)
    obs, reward, done, truncated, info = env.step(action)
    # logs

# visualize the result
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot(reward_list[:-1], label='reward', color='tab:orange')
ax1.legend(loc='upper left')

Reward Observed during a Single Episode

Note that while we use SCOPE-RL and d3rlpy here, BasicGym is compatible with any other libraries working on the OpenAI Gym and Gymnasium-like interface.

Customized BasicEnv

Next, we describe how to customize the environment by instantiating the environment.

List of environmental configurations: (click to expand)
  • step_per_episode: Number of timesteps in an episode.
  • state_dim: Dimension of the state.
  • action_type: Type of the action space.
  • n_actions: Number of actions in the discrete action case.
  • action_dim: Dimension of the action (context).
  • action_context: Feature vectors that characterize each action. Applicable only when action_type is "discrete".
  • reward_type: Reward type.
  • reward_std: Noise level of the reward. Applicable only when reward_type is "continuous".
  • obs_std: Noise level of the state observation.
  • StateTransitionFunction: State transition function.
  • RewardFunction: Expected immediate reward function
  • random_state : Random state.
from basicgym import BasicEnv
env = BasicEnv(
    action_type="continuous",  # "discrete"
    reward_type="continuous",  # "ninary"

Specifically, users can define their own StateTransitionFunction and RewardFunction as follows.

Example of Custom State Transition Function

# import basicgym modules
from basicgym import BaseStateTransitionFunction
# import other necessary stuffs
from dataclasses import dataclass
from typing import Optional
import numpy as np

class CustomizedStateTransitionFunction(BaseStateTransitionFunction):
    state_dim: int
    action_dim: int
    random_state: Optional[int] = None

    def __post_init__(self):
        self.random_ = check_random_state(self.random_state)
        self.state_coef = self.random_.normal(loc=0.0, scale=1.0, size=(self.state_dim, self.state_dim))
        self.action_coef = self.random_.normal(loc=0.0, scale=1.0, size=(self.state_dim, self.action_dim))

    def step(
        state: np.ndarray,
        action: np.ndarray,
    ) -> np.ndarray:
        state = self.state_coef @ state / self.state_dim +  self.action_coef @ action / self.action_dim
        state = state / np.linalg.norm(state, ord=2)
        return state

Example of RewardFunction

# import basicgym modules
from basicgym import BaseRewardFunction
# import other necessary stuffs
from dataclasses import dataclass
from typing import Optional
import numpy as np

class CustomizedRewardFunction(BaseRewardFunction):
    state_dim: int
    action_dim: int
    reward_type: str = "continuous"  # "binary"
    reward_std: float = 0.0
    random_state: Optional[int] = None

    def __post_init__(self):
        self.random_ = check_random_state(self.random_state)
        self.state_coef = self.random_.normal(loc=0.0, scale=1.0, size=(self.state_dim, ))
        self.action_coef = self.random_.normal(loc=0.0, scale=1.0, size=(self.action_dim, ))

    def mean_reward_function(
        state: np.ndarray,
        action: np.ndarray,
    ) -> float:
        reward = self.state_coef.T @ state / self.state_dim + self.action_coef.T @ action / self.action_dim
        return reward

More examples are available at quickstart/basic/basic_synthetic_customize_env.ipynb.


If you use our software in your work, please cite our paper:

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation


  author = {Kiyohara, Haruka and Kishimoto, Ren and Kawakami, Kosuke and Kobayashi, Ken and Nataka, Kazuhide and Saito, Yuta},
  title = {SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation},
  journal={arXiv preprint arXiv:2311.18206},
  year = {2023},


Any contributions to BasicGym are more than welcome! Please refer to for general guidelines on how to contribute the project.


This project is licensed under Apache 2.0 license - see LICENSE file for details.

Project Team

  • Haruka Kiyohara (Main Contributor; Cornell University)
  • Ren Kishimoto (Tokyo Institute of Technology)
  • Kosuke Kawakami (HAKUHODO Technologies Inc.)
  • Ken Kobayashi (Tokyo Institute of Technology)
  • Kazuhide Nakata (Tokyo Institute of Technology)
  • Yuta Saito (Cornell University)


For any questions about the paper and software, feel free to contact: [email protected]


Papers (click to expand)
  1. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.

  2. Takuma Seno and Michita Imai. d3rlpy: An Offline Deep Reinforcement Library, arXiv preprint arXiv:2111.03788, 2021.

Projects (click to expand)

This project is inspired by the following package.