Compatibility and hidden state handling of RNN with Off-Policy Algorithms

Hi BenchMARL team,

First of all, thank you for this amazing and highly modular library!

I am currently working on a custom MARL environment where I need agents to process spatial topologies as well as retain temporal memory. To achieve this, I am combining a GNN and a GRU using the SequenceModelConfig alongside the Masac (Off-policy) algorithm.

My model configuration looks like this:
`model_config = SequenceModelConfig(
    model_configs=[gnn_config, gru_config],
    intermediate_sizes=[256],
)`
since MASAC is an off-policy algorithm, I have a few theoretical and implementation questions regarding how it handles the replay buffer and optimization:
1. Replay Buffer Sampling & BPTT: When using an RNN (GRU/LSTM) in an off-policy setting like MASAC, does the BenchMARL replay buffer automatically sample intact trajectories/sequences instead of independent random transitions? If so, how is the sequence length (BPTT length) defined and controlled?

2. Hidden State Management during Training: During the off-policy _optimizer_loop, does the framework use the stale hidden states stored in the replay buffer, or does it perform a "burn-in" to recompute the hidden states using the most up-to-date network weights?

3. Out-of-the-box Compatibility: Is the combination of SequenceModelConfig(GNN, GRU) + Masac **fully supported out-of-the-box**? Are there any specific parameters I need to tweak in the ExperimentConfig (e.g., specific batching rules for off-policy RNNs) to ensure the recurrent gradients backpropagate correctly?
Any guidance or pointers to relevant parts of the codebase would be highly appreciated. Thanks again for your time and for maintaining this great project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility and hidden state handling of RNN with Off-Policy Algorithms #248

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compatibility and hidden state handling of RNN with Off-Policy Algorithms #248

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions