In this version, we propose to seperate the computation of the action from the computation of the critic.
- First, the workspace is computed by executing the environment and the policy
- the policy is represented by two agents: the
prob_agent
computing action probabilities, and theaction_agent
computing the action
- the policy is represented by two agents: the
- In a second time, the
critic_agent
is computed
It illustrates the modularity of the library that allows to implement very complex policies by combining elementary agents -- see other examples