Releases: biological-alignment-benchmarks/biological-alignment-gridagents-benchmarks
v0.9.4.1
v0.9.4.0
- Save "pip list" to log folder.
- When archiving ai_safety_gridworlds and zoo_to_gym_multiagent_adapter code, check different locations since the location depends on setup.
- Add more file types to the archive.
- Save benchmarking code version, gridworlds code version and gpu model in the output jsonl.
- Adjustments to package versions.
v0.9.3.8
Various smaller adjustments and improvements
Adding timestamps to console output
Adding Zenodo association
Adding support for imitation learning / expert override with PPO multi-agent weight sharing setup.
Adding configuration files for imitation learning. Also adding configs for 2-layout trials.
Adding support for use_expln PPO and A2C policy argument to mitigate NaNs in SB3 tensors.
Adding support for SB3 AdamW optimizer, needed to mitigate occasional PPO NaN's during imitation learning.
Adding support for target_kl PPO config argument to mitigate NaNs in SB3 tensors.
Adding support for early_detect_nans config argument to early detect NaNs in SB3 tensors.
Adding support for soft_stop_training_on_nan_errors to handle NaNs in SB3 tensors smoothly.
Adding retry_training_on_nans_for_n_times and skip_test_on_training_stop_on_nan_errors config settings.
Adding new fields to the jsonl log: num_actual_train_episodes, training_run_was_terminated_early_due_to_nans, num_training_retries_used, test_checkpoint_filenames.
v0.9.3.6
Various smaller adjustments and improvements
Adding Zenodo association
Adding support for imitation learning / expert override with PPO multi-agent weight sharing setup.
Adding configuration files for imitation learning. Also adding configs for 2-layout trials.
Adding support for use_expln PPO and A2C policy argument to mitigate NaNs in SB3 tensors.
Adding support for SB3 AdamW optimizer, needed to mitigate occasional PPO NaN's during imitation learning.
Adding support for target_kl PPO config argument to mitigate NaNs in SB3 tensors.
Adding support for early_detect_nans config argument to early detect NaNs in SB3 tensors.
Adding support for soft_stop_training_on_nan_errors to handle NaNs in SB3 tensors smoothly.
Adding retry_training_on_nans_for_n_times and skip_test_on_training_stop_on_nan_errors config settings.
Adding new fields to the jsonl log: num_actual_train_episodes, training_run_was_terminated_early_due_to_nans, num_training_retries_used, test_checkpoint_filenames.
v0.9.3.5
Adding Zenodo association
Adding support for imitation learning / expert override with PPO multi-agent weight sharing setup.
Adding configuration files for imitation learning. Also adding configs for 2-layout trials.
Adding support for use_expln PPO and A2C policy argument to mitigate NaNs in SB3 tensors.
Adding support for SB3 AdamW optimizer, needed to mitigate occasional PPO NaN's during imitation learning.
Adding support for target_kl PPO config argument to mitigate NaNs in SB3 tensors.
Adding support for early_detect_nans config argument to early detect NaNs in SB3 tensors.
Adding support for soft_stop_training_on_nan_errors to handle NaNs in SB3 tensors smoothly.
Adding retry_training_on_nans_for_n_times and skip_test_on_training_stop_on_nan_errors config settings.
Adding new fields to the jsonl log: num_actual_train_episodes, training_run_was_terminated_early_due_to_nans, num_training_retries_used, test_checkpoint_filenames.
v0.9.3.3
Adding support for imitation learning / expert override with PPO multi-agent weight sharing setup.
Adding support for use_expln PPO and A2C policy argument to mitigate NaNs in SB3 tensors.
Adding support for SB3 AdamW optimizer, needed to mitigate occasional PPO NaN's during imitation learning.
Adding support for target_kl PPO config argument to mitigate NaNs in SB3 tensors.
Adding support for early_detect_nans config argument to early detect NaNs in SB3 tensors.
Adding support for soft_stop_training_on_nan_errors to handle NaNs in SB3 tensors smoothly.
Adding configuration files for imitation learning. Also adding configs for 2-layout trials.
v0.9.3.2
- Adding support for imitation learning / expert override with PPO multi-agent weight sharing setup.
- Adding support for use_expln PPO and A2C policy argument to mitigate NaNs in SB3 tensors.
- Adding support for SB3 AdamW optimizer, needed to mitigate occasional PPO NaN's during imitation learning.
- Adding configuration files for imitation learning. Also adding configs for 2-layout trials.
v0.9.2
Updating publication title and authorship