This project involves training a Proximal Policy Optimization (PPO) model to solve the Lunar Lander environment from OpenAI Gym. The goal is to optimize the model's hyperparameters to achieve stable and efficient landings.
Here is an overview of the files in this repository:
Lunar_lander_py.ipynb
: Jupyter notebook containing the main code for setting up the environment, training the PPO model, and evaluating its performance.best_hyperparameters.json
: JSON file containing the best hyperparameters found using Optuna.system_info.txt
: Text file with system information used during the experiment, including Python version, library versions, and hardware details.policy.pth
: The final trained model's policy parameters.policy.optimizer.pth
: The optimizer state for the trained model.pytorch_variables.pth
: Additional PyTorch variables used during training.
To run this project, you will need:
- Python 3.10 or higher
gymnasium
librarystable-baselines3
libraryoptuna
librarypytorch
library
-
Clone the repository:
git clone <repository_url> cd <repository_directory>
-
Install the required packages:
pip install -r requirements.txt
Ensure
requirements.txt
contains all necessary libraries:gymnasium stable-baselines3 optuna torch
To train the model using the provided notebook:
-
Open the Jupyter notebook
Lunar_lander_py.ipynb
. -
Follow the steps to train the PPO model. The notebook is structured to guide you through:
- Setting up the Lunar Lander environment.
- Using Optuna to find the best hyperparameters.
- Training the final model using the best-found hyperparameters.
- Saving and evaluating the trained model.
If you want to use the pre-trained model:
-
Load the model using the provided weights (
policy.pth
,policy.optimizer.pth
, andpytorch_variables.pth
):import torch from stable_baselines3 import PPO model = PPO.load("ppo_lunarlander_v2_best")
-
Evaluate the model:
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize import gymnasium as gym from stable_baselines3.common.evaluation import evaluate_policy env = DummyVecEnv([lambda: gym.make("LunarLander-v2")]) env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.) mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10, deterministic=True) print(f"mean_reward={mean_reward:.2f} +/- {std_reward:.2f}")
-
Define the Objective Function:
- Create a function to train the model with suggested hyperparameters and return the mean reward.
-
Optimize with Optuna:
- Use Optuna to run multiple trials, each with different hyperparameter values, to find the optimal set.
-
Save Best Hyperparameters:
- Save the best hyperparameters found into a JSON file.
-
Train Final Model:
- Use the best hyperparameters to train the final model and save the trained model's parameters.
Here’s a simplified workflow outline:
-
Set up environment:
import gymnasium as gym env = gym.make("LunarLander-v2")
-
Train model with chosen hyperparameters:
from stable_baselines3 import PPO model = PPO( "MlpPolicy", env, learning_rate=0.00025, n_steps=1024, batch_size=128, n_epochs=4, gamma=0.999, verbose=1 ) model.learn(total_timesteps=1000000) model.save("ppo_lunarlander_v2_best")
-
Evaluate the model:
from stable_baselines3.common.evaluation import evaluate_policy mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10, deterministic=True) print(f"mean_reward={mean_reward:.2f} +/- {std_reward:.2f}")
This project demonstrates the process of optimizing a PPO model for the Lunar Lander environment. By following the steps outlined, you can reproduce the results, tweak the hyperparameters further, and improve the model's performance.
For any questions or contributions, feel free to open an issue or submit a pull request. Happy coding!