GitHub - mariaegarciab/Lunar_Lander: This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.

Project Overview

This project involves training a Proximal Policy Optimization (PPO) model to solve the Lunar Lander environment from OpenAI Gym. The goal is to optimize the model's hyperparameters to achieve stable and efficient landings.

Repository Structure

Here is an overview of the files in this repository:

Lunar_lander_py.ipynb: Jupyter notebook containing the main code for setting up the environment, training the PPO model, and evaluating its performance.
best_hyperparameters.json: JSON file containing the best hyperparameters found using Optuna.
system_info.txt: Text file with system information used during the experiment, including Python version, library versions, and hardware details.
policy.pth: The final trained model's policy parameters.
policy.optimizer.pth: The optimizer state for the trained model.
pytorch_variables.pth: Additional PyTorch variables used during training.

Getting Started

Prerequisites

To run this project, you will need:

Python 3.10 or higher
gymnasium library
stable-baselines3 library
optuna library
pytorch library

Installation

Clone the repository:

git clone <repository_url>
cd <repository_directory>

Install the required packages:
```
pip install -r requirements.txt
```
Ensure requirements.txt contains all necessary libraries:
```
gymnasium
stable-baselines3
optuna
torch
```

Running the Project

Training the Model

To train the model using the provided notebook:

Open the Jupyter notebook Lunar_lander_py.ipynb.
Follow the steps to train the PPO model. The notebook is structured to guide you through:
- Setting up the Lunar Lander environment.
- Using Optuna to find the best hyperparameters.
- Training the final model using the best-found hyperparameters.
- Saving and evaluating the trained model.

Using the Trained Model

If you want to use the pre-trained model:

Load the model using the provided weights (policy.pth, policy.optimizer.pth, and pytorch_variables.pth):
```
import torch
from stable_baselines3 import PPO

model = PPO.load("ppo_lunarlander_v2_best")
```

Evaluate the model:

from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
import gymnasium as gym
from stable_baselines3.common.evaluation import evaluate_policy

env = DummyVecEnv([lambda: gym.make("LunarLander-v2")])
env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)

mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward:.2f}")

Detailed Steps for Hyperparameter Optimization

Define the Objective Function:
- Create a function to train the model with suggested hyperparameters and return the mean reward.
Optimize with Optuna:
- Use Optuna to run multiple trials, each with different hyperparameter values, to find the optimal set.
Save Best Hyperparameters:
- Save the best hyperparameters found into a JSON file.
Train Final Model:
- Use the best hyperparameters to train the final model and save the trained model's parameters.

Example Workflow

Here’s a simplified workflow outline:

Set up environment:

import gymnasium as gym
env = gym.make("LunarLander-v2")

Train model with chosen hyperparameters:

from stable_baselines3 import PPO

model = PPO(
    "MlpPolicy",
    env,
    learning_rate=0.00025,
    n_steps=1024,
    batch_size=128,
    n_epochs=4,
    gamma=0.999,
    verbose=1
)
model.learn(total_timesteps=1000000)
model.save("ppo_lunarlander_v2_best")

Evaluate the model:

from stable_baselines3.common.evaluation import evaluate_policy

mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward:.2f}")

Conclusion

This project demonstrates the process of optimizing a PPO model for the Lunar Lander environment. By following the steps outlined, you can reproduce the results, tweak the hyperparameters further, and improve the model's performance.

For any questions or contributions, feel free to open an issue or submit a pull request. Happy coding!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_stable_baselines3_version		_stable_baselines3_version
data		data
policy.optimizer.pth		policy.optimizer.pth
policy.pth		policy.pth
pytorch_variables.pth		pytorch_variables.pth
replay (1).mp4		replay (1).mp4
system_info.txt		system_info.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Repository Structure

Getting Started

Prerequisites

Installation

Running the Project

Training the Model

Using the Trained Model

Detailed Steps for Hyperparameter Optimization

Example Workflow

Conclusion

About

Releases

Packages

License

mariaegarciab/Lunar_Lander

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Repository Structure

Getting Started

Prerequisites

Installation

Running the Project

Training the Model

Using the Trained Model

Detailed Steps for Hyperparameter Optimization

Example Workflow

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages