feature(xjy): Fixed the accumulate_steps, game_segment/weighted_total_loss bugs and refine prompts, compute_llm_prior, and SFT loss, and added cprofile functionality. #441

xiongjyu · 2025-11-20T15:48:21Z

No description provided.

…_llm_prior, and SFT loss

xiongjyu · 2025-11-24T13:31:45Z

zoo/jericho/priorzero/priorzero_policy.py

+            llm_sft_loss = torch.tensor(0.0, device=self._cfg.device)
+        if self.llm_policy_cfg.enable_llm and self.llm_policy_cfg.enable_rft:
+            with self._profile_block(name="train_llm_rft"):
+                llm_rft_loss = self.compute_rft_loss(


这个target_value，我理解应该是和 observation 一一对应的，没有错位吧？比如target_value第一个值代表obs中第一个状态下的结果

puyuan1996 · 2025-11-24T13:57:34Z

zoo/jericho/priorzero/priorzero_policy.py

+            sequence_log_probs = token_log_probs.sum(dim=-1) / (mask.sum(dim=-1) + 1e-8)
+
+            if self.llm_policy_cfg.rft_reward=='value':
+                rewards_tensor = torch.tensor(batch_values, device=self._cfg.device, dtype=torch.float32)


rewards_tensor 重命名为 advantage_tansor 吧？

…lect to cprofile.

…ed the REINFORCE-series loss computation.

xiongjyu added 5 commits November 20, 2025 12:48

Fix game_segment/weighted_total_loss bugs and refine prompts, compute…

a3a2d69

…_llm_prior, and SFT loss

Fixed the accumulate_steps bug and added cprofile functionality.

959a558

Refine the code and fix the bug in data collection.

ecedc5f

Add REINFORCE-style losses and store old_logprob in the buffer.

2d53d22

Fix the get_llm_prior bug so that every action receives a logprob

c608600

xiongjyu commented Nov 24, 2025

View reviewed changes

puyuan1996 reviewed Nov 24, 2025

View reviewed changes

fixed the history bug in the build_llm_prompt and logs in forward_learn

15e39f6

xiongjyu deleted the branch opendilab:dev-multitask-balance-clean-rft November 24, 2025 14:28

xiongjyu closed this Nov 24, 2025

xiongjyu deleted the dev-multitask-balance-clean-rft branch November 24, 2025 14:28

xiongjyu reopened this Nov 24, 2025

xiongjyu added 4 commits November 24, 2025 22:35

rename advantage_tensor on rft

7c9acd9

Fixed the action out-of-bounds bug and added a record for forward_col…

738f300

…lect to cprofile.

Fixed the misalignment between old_log_prob and log_prob, and correct…

0a166f6

…ed the REINFORCE-series loss computation.

add some logs for analysying

4f3668e

puyuan1996 added the research Research work in progress label Nov 28, 2025

xiongjyu added 4 commits November 30, 2025 01:47

Polish the code and standardize the format.

2985e60

Add kL divergence in rft and llm_prior_entropy in collect

ff98006

polish config and format

7e43e45

delete unused files

d6555e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature(xjy): Fixed the accumulate_steps, game_segment/weighted_total_loss bugs and refine prompts, compute_llm_prior, and SFT loss, and added cprofile functionality. #441

feature(xjy): Fixed the accumulate_steps, game_segment/weighted_total_loss bugs and refine prompts, compute_llm_prior, and SFT loss, and added cprofile functionality. #441

Uh oh!

xiongjyu commented Nov 20, 2025

Uh oh!

xiongjyu Nov 24, 2025

Uh oh!

puyuan1996 Nov 24, 2025

Uh oh!

xiongjyu Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feature(xjy): Fixed the accumulate_steps, game_segment/weighted_total_loss bugs and refine prompts, compute_llm_prior, and SFT loss, and added cprofile functionality. #441

Are you sure you want to change the base?

feature(xjy): Fixed the accumulate_steps, game_segment/weighted_total_loss bugs and refine prompts, compute_llm_prior, and SFT loss, and added cprofile functionality. #441

Uh oh!

Conversation

xiongjyu commented Nov 20, 2025

Uh oh!

xiongjyu Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

xiongjyu Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants