Skip to content

Fix preprocessor reference logprob worker wiring#131

Open
taivu1998 wants to merge 1 commit intoServiceNow:mainfrom
taivu1998:tdv/fix-reference-logprobs
Open

Fix preprocessor reference logprob worker wiring#131
taivu1998 wants to merge 1 commit intoServiceNow:mainfrom
taivu1998:tdv/fix-reference-logprobs

Conversation

@taivu1998
Copy link
Copy Markdown

@taivu1998 taivu1998 commented Apr 2, 2026

Summary

  • assign real reference TrainableLLM clients to preprocessor workers at process startup
  • add an early guard for KL-enabled preprocessing when no reference LLM URLs are available
  • remove the dead per-chunk reference-selection path and reuse the existing tokenizer inside workers
  • update launcher validation so final_kl_coef also requires reference-model capacity

Closes #62.

Problem

The preprocessor constructed reference LLM clients from me.llm_urls, but each worker process was still launched with llm=None. As a result, preprocess_dataset(...) always fell back to copying rollout logprobs into ref_logprobs.

That meant KL-enabled runs were no longer using the frozen reference policy that run_ref_llm(...) launches. Instead, the trainer compared the new policy against rollout-time actor logprobs, which changes both the loss semantics and the reported KL metrics.

What changed

  • add small helpers in pipelinerl/preprocess.py to detect when reference logprobs are required, validate setup, and assign reference clients to workers
  • pass one real reference TrainableLLM into each preprocessor worker when it is spawned instead of hard-coding None
  • set llm.tokenizer = tokenizer inside the worker so the reference client can reuse the already-loaded tokenizer
  • log the discovered reference URLs and each worker's reference source for easier runtime validation
  • remove the unused next_llm_index code from the main preprocessing loop
  • extend pipelinerl/launch.py config validation to treat either kl_coef or final_kl_coef as requiring world.preprocessor_fraction > 0

Validation

  • python3.11 -m py_compile pipelinerl/preprocess.py pipelinerl/launch.py
  • python3.11 -m compileall pipelinerl
  • git diff --check

Notes

  • only pipelinerl/preprocess.py and pipelinerl/launch.py are included in this PR
  • analysis/plan markdown files in the working tree were intentionally left out of the commit
  • I did not run a full distributed end-to-end training job here because that requires the full runtime stack and live reference/inference services

@taivu1998
Copy link
Copy Markdown
Author

Hi @rizar, could you help review this PR? Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix reference log probs calculations

1 participant