easyr1 verl val_generations_to_log (#121)

Zeyi-Lin · web-flow · commit d5511d2794e6 · 2025-03-19T02:56:19.000+08:00
* update

* update verl
diff --git a/en/guide_cloud/integration/integration-easyr1.md b/en/guide_cloud/integration/integration-easyr1.md
@@ -62,6 +62,19 @@ In the `EasyR1` directory, execute the following command to train the Qwen2.5-VL
 bash examples/run_qwen2_5_vl_7b_geo_swanlab.sh
 ```
 
+## 4. Record Generated Text During Each Evaluation Round
+
+If you want to log the generated text to SwanLab during each evaluation round (`val`), simply add the line `val_generations_to_log=1` in the command:
+
+```bash {6}
+python3 -m verl.trainer.main \
+    config=examples/grpo_example.yaml \
+    worker.actor.model.model_path=${MODEL_PATH} \
+    trainer.logger=['console','swanlab'] \
+    trainer.n_gpus_per_node=4 \
+    val_generations_to_log=1
+```
+
 ## Final Remarks
 
 EasyR1 is a new open-source project by [hiyouga](https://github.com/hiyouga), the author of [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory]), a reinforcement learning framework for multimodal large models. We thank [hiyouga](https://github.com/hiyouga) for his contributions to the global open-source ecosystem, and SwanLab will continue to accompany AI developers.
diff --git a/en/guide_cloud/integration/integration-verl.md b/en/guide_cloud/integration/integration-verl.md
@@ -120,4 +120,18 @@ swanlab watch
 
 For more details, refer to [SwanLab Offline Dashboard Mode](https://docs.swanlab.cn/guide_cloud/self_host/offline-board.html).
 
-To set the port number on the server, refer to [Offline Dashboard Port Number](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7).
+To set the port number on the server, refer to [Offline Dashboard Port Number](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7).
+
+
+## Record Generated Text During Each Evaluation Round
+
+If you wish to log the generated text to SwanLab during each evaluation round (`val`), simply add the line `val_generations_to_log_to_wandb=1` in the command:
+
+```bash {5}
+PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
+ data.train_files=$HOME/data/gsm8k/train.parquet \
+ data.val_files=$HOME/data/gsm8k/test.parquet \
+ trainer.logger=['console','swanlab'] \
+ val_generations_to_log_to_wandb=1 \
+ ...
+```
diff --git a/zh/guide_cloud/integration/integration-easyr1.md b/zh/guide_cloud/integration/integration-easyr1.md
@@ -38,7 +38,7 @@ bash examples/run_qwen2_5_7b_math_swanlab.sh
 
 当然，这里我们可以剖析一下，由于EasyR1是原始 veRL 项目的一个干净分叉，所以继承了[veRL与SwanLab的集成](/guide_cloud/integration/integration-verl.md)。所以这里我们来看`run_qwen2_5_7b_math_swanlab.sh`文件：
 
-```sh
+```sh {10}
 set -x
 
 export VLLM_ATTENTION_BACKEND=XFORMERS
@@ -48,7 +48,7 @@ MODEL_PATH=Qwen/Qwen2.5-7B-Instruct  # replace it with your local file path
 python3 -m verl.trainer.main \
     config=examples/grpo_example.yaml \
     worker.actor.model.model_path=${MODEL_PATH} \
-    trainer.logger=['console','swanlab'] \  # [!code ++]
+    trainer.logger=['console','swanlab'] \
     trainer.n_gpus_per_node=4
 ```
 
@@ -62,6 +62,21 @@ python3 -m verl.trainer.main \
 bash examples/run_qwen2_5_vl_7b_geo_swanlab.sh
 ```
 
+## 4. 每轮评估时记录生成文本
+
+如果你希望在每轮评估（val）时将生成的文本记录到SwanLab中，只需在命令行钟增加一行`val_generations_to_log=1`即可：
+
+```bash {6}
+python3 -m verl.trainer.main \
+    config=examples/grpo_example.yaml \
+    worker.actor.model.model_path=${MODEL_PATH} \
+    trainer.logger=['console','swanlab'] \
+    trainer.n_gpus_per_node=4 \
+    val_generations_to_log=1
+```
+
+
+
 ## 写在最后
 
 EasyR1 是 [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) 作者 [hiyouga](https://github.com/hiyouga) 的全新开源项目，一个适用于多模态大模型的强化学习框架。感谢 [hiyouga](https://github.com/hiyouga) 为全球开源生态的贡献，SwanLab也将继续与AI开发者同行。
diff --git a/zh/guide_cloud/integration/integration-verl.md b/zh/guide_cloud/integration/integration-verl.md
@@ -123,3 +123,16 @@ swanlab watch
 更多详细可以参考[SwanLab离线看板模式](https://docs.swanlab.cn/guide_cloud/self_host/offline-board.html)
 
 服务器设置端口号可以查看[离线看板端口号](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7)
+
+## 每轮评估时记录生成文本
+
+如果你希望在每轮评估（val）时将生成的文本记录到SwanLab中，只需在命令行钟增加一行`val_generations_to_log_to_wandb=1`即可：
+
+```bash {5}
+PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
+ data.train_files=$HOME/data/gsm8k/train.parquet \
+ data.val_files=$HOME/data/gsm8k/test.parquet \
+ trainer.logger=['console','swanlab'] \
+ val_generations_to_log_to_wandb=1 \
+ ...
+```