v2.5.1
English Version
New Features:
- Support for RM for LLM and MLLM, as well as PPO for LLM.
New Models:
- molmo series
- mplug-owl3 1b/2b
- llama3.1-nemotron-70b-instruct
- deepseek-janus
中文版
新特性:
- 支持LLM和MLLM的RM, 以及LLM的PPO.
新模型:
- molmo系列
- mplug-owl3 1b/2b
- llama3.1-nemotron-70b-instruct
- deepseek-janus
What's Changed
- support reward modeling and ppo by @hjh0119 in #2093
- fix rescale_image by @tastelikefeet in #2223
- fix deploy timeout by @Jintao-Huang in #2230
- Fix qwen2 vl batch size by @Jintao-Huang in #2239
- Fix ovis1.6 infer by @Jintao-Huang in #2242
- fix publish by @Jintao-Huang in #2245
- fix qwen2vl video args by @Jintao-Huang in #2251
- Update FAQ by @slin000111 in #2252
- Support molmo series vlm by @mi804 in #2260
- fix sft system by @Jintao-Huang in #2262
- support mplug3 1b/2b by @Jintao-Huang in #2271
- Fix deploy openai by @Jintao-Huang in #2278
- fix vllm ignore suffix by @Jintao-Huang in #2287
- fix lora_target_modules in PPO by @hjh0119 in #2274
- fix quant blocks by @Jintao-Huang in #2292
- Support Llama3.1-nemotron-70b-inst-hf by @DaozeZhang in #2299
- fix ppo citest by @hjh0119 in #2302
- support deepseek-janus by @Jintao-Huang in #2300
- update molmo by @Jintao-Huang in #2305
New Contributors
Full Changelog: v2.5.0...v2.5.1