Question About Stage 2 Training

Thank you for sharing this great work!
In the paper, the hyper-policy is trained with a two-stage reward, and it is stated that “_the hyper-policy is continually updated._” Based on this description, I understood that Stage 2 training continues from the parameters learned in Stage 1, i.e., as a form of fine-tuning.

However, in the released code, it appears that:

- Stage 1 and Stage 2 training are executed separately
- In the Stage 2 training code, I could not find where the hyper-policy checkpoint from Stage 1 is loaded or resumed
- As a result, Stage 2 seems to start again from the same initial parameters as Stage 1

Is it possible that I missed a checkpoint load/resume code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question About Stage 2 Training #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question About Stage 2 Training #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions