Fix preprocessor reference logprob worker wiring by taivu1998 · Pull Request #131 · ServiceNow/PipelineRL

taivu1998 · 2026-04-02T19:03:22Z

Summary

assign real reference TrainableLLM clients to preprocessor workers at process startup
add an early guard for KL-enabled preprocessing when no reference LLM URLs are available
remove the dead per-chunk reference-selection path and reuse the existing tokenizer inside workers
update launcher validation so final_kl_coef also requires reference-model capacity

Closes #62.

Problem

The preprocessor constructed reference LLM clients from me.llm_urls, but each worker process was still launched with llm=None. As a result, preprocess_dataset(...) always fell back to copying rollout logprobs into ref_logprobs.

That meant KL-enabled runs were no longer using the frozen reference policy that run_ref_llm(...) launches. Instead, the trainer compared the new policy against rollout-time actor logprobs, which changes both the loss semantics and the reported KL metrics.

What changed

add small helpers in pipelinerl/preprocess.py to detect when reference logprobs are required, validate setup, and assign reference clients to workers
pass one real reference TrainableLLM into each preprocessor worker when it is spawned instead of hard-coding None
set llm.tokenizer = tokenizer inside the worker so the reference client can reuse the already-loaded tokenizer
log the discovered reference URLs and each worker's reference source for easier runtime validation
remove the unused next_llm_index code from the main preprocessing loop
extend pipelinerl/launch.py config validation to treat either kl_coef or final_kl_coef as requiring world.preprocessor_fraction > 0

Validation

python3.11 -m py_compile pipelinerl/preprocess.py pipelinerl/launch.py
python3.11 -m compileall pipelinerl
git diff --check

Notes

only pipelinerl/preprocess.py and pipelinerl/launch.py are included in this PR
analysis/plan markdown files in the working tree were intentionally left out of the commit
I did not run a full distributed end-to-end training job here because that requires the full runtime stack and live reference/inference services

taivu1998 · 2026-04-04T06:57:02Z

Hi @rizar, could you help review this PR? Thank you so much!

Fix reference logprob worker wiring

0cfef33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix preprocessor reference logprob worker wiring#131

Fix preprocessor reference logprob worker wiring#131
taivu1998 wants to merge 1 commit intoServiceNow:mainfrom
taivu1998:tdv/fix-reference-logprobs

taivu1998 commented Apr 2, 2026 •

edited

Loading

Uh oh!

taivu1998 commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taivu1998 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

What changed

Validation

Notes

Uh oh!

taivu1998 commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

taivu1998 commented Apr 2, 2026 •

edited

Loading