Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ms-swift3 Suggestion Box #2217

Open
28 tasks done
Jintao-Huang opened this issue Oct 10, 2024 · 31 comments
Open
28 tasks done

ms-swift3 Suggestion Box #2217

Jintao-Huang opened this issue Oct 10, 2024 · 31 comments
Labels

Comments

@Jintao-Huang
Copy link
Collaborator

Jintao-Huang commented Oct 10, 2024

中文:

  • 弱化model_type的概念, 支持只使用<model_id_or_path>自动检测model_type (config.json).
  • template模块和dataset模块 拥抱messages数据集格式.
    • 去除generation-template的概念. 使用use_generate_template参数来控制获取base model需要的template, 以支持所有多模态模型的CPT.
    • preprocessor模块更加智能. 引入AutoPreprocessor.
  • 支持训练重要功能的定制化, 采用插件化设计, 例如: loss_type, loss_scale, trainer, optimizer, callback, metric.
  • 更强的代码可读性, 层次化设计, 支持不同需求用户从代码、命令行、web-ui对ms-swift进行使用和再开发.
  • 重构文档和examples.
  • 统一的推理与部署接口, 采用类设计支持vllm/lmdeploy/pt/client.
    • pt支持batch
    • pt支持多卡/deepspeed
    • 多lora推理体验优化.
  • 优化encode/post_encode多模态模型训练机制
  • 提升大型预训练时的训练鲁棒性.
  • 优化对其他训练框架全参微调模型继续微调、推理、量化、部署的接入流程.

English:

  • De-emphasize the concept of model_type, supporting automatic detection of model_type using only <model_id_or_path> (config.json).
  • The template module and dataset module embrace the messages dataset format.
    • Remove the concept of generation-template. Use the use_generate_template parameter to control the template needed for acquiring the base model, in order to support the CPT of all multimodal models.
    • Make the preprocessor module smarter. Introduce AutoPreprocessor.
  • Support customization of training functionalities with a plugin design, such as loss_type, loss_scale, trainer, optimizer, callback, metric.
  • Enhance code readability with a hierarchical design, allowing users with different needs to utilize and redevelop ms-swift through code, command line, and web UI.
  • Refactor documentation and examples.
  • Unified inference and deployment interface, utilizing class design to support vllm/lmdeploy/pt/client.
    • PT supports batch
    • PT supports multi-GPU/DeepSpeed
    • Optimization of multi-Lora inference experience.
  • Optimize the training mechanism of the encode/post_encode multimodal model.
  • Enhance the training robustness during large-scale pre-training.
  • Optimize the integration process for continued fine-tuning, inference, quantization, and deployment of full-parameter fine-tuned models with other training frameworks.
@bonre
Copy link

bonre commented Oct 10, 2024

非常感谢贵组的辛苦工作!
针对第三点,请问是否有想法加入类似 channel loss 的观察功能呢?即针对不同下游任务的数据集单独观察loss变化趋势。我看2.5版本已经支持了对于 MLLM 的 PT,我想这个功能对于做 MLLM 的 Post Pre-Train 是比较重要的。望采纳 :>

@EdisonLeeeee
Copy link

您好,非常感谢你们的开源工作!请问后续会有支持RAG的相关计划吗?

@Jintao-Huang
Copy link
Collaborator Author

您好,非常感谢你们的开源工作!请问后续会有支持RAG的相关计划吗?

会有的,但应该不会在3.0加入哈,大概是3.1/3.2左右会加入

@Betty-J
Copy link

Betty-J commented Oct 10, 2024

您好,可以增加自定义 evaluation 评价指标的相关接口吗

@Jintao-Huang
Copy link
Collaborator Author

您好,可以增加自定义 evaluation 评价指标的相关接口吗

是的, 这是个重要的功能.

@liujiachang
Copy link

3.0会有对多卡npu适配的完整demo吗

@firefighter-eric
Copy link

hello,训练流程会有tp、pp...的支持吗

@Jintao-Huang
Copy link
Collaborator Author

3.0会有对多卡npu适配的完整demo吗

这个看能不能借到卡😊

@Jintao-Huang Jintao-Huang pinned this issue Oct 14, 2024
@Aunali321
Copy link
Contributor

Allow to change datasets column names from HuggingFace/ModelScope to a swift supported format. Currently you have to download dataset from Huggingface, change column names and reupload it to use with swift.

@Aunali321
Copy link
Contributor

Also swift's dataset preparation is very strict despite using --check_dataset_strategy none. It logs cryptic errors that does not explain what went wrong.

For example, It will not accept a dataset that has User message as a last message.
Another example is that it doesn't accept a dataset if it has repeating roles such as Assistant -> Assistant -> User.
It also complained about KeyError: 'conversations' in a dataset that didn't have a conversation column at all.

In a large dataset it's impossible to check for each row and fix it. There should be an option to continue despite this issues.

@Jintao-Huang
Copy link
Collaborator Author

Also swift's dataset preparation is very strict despite using --check_dataset_strategy none. It logs cryptic errors that does not explain what went wrong.

For example, It will not accept a dataset that has User message as a last message. Another example is that it doesn't accept a dataset if it has repeating roles such as Assistant -> Assistant -> User. It also complained about KeyError: 'conversations' in a dataset that didn't have a conversation column at all.

In a large dataset it's impossible to check for each row and fix it. There should be an option to continue despite this issues.

Great suggestion, thank you!

@Jintao-Huang
Copy link
Collaborator Author

hello,训练流程会有tp、pp...的支持吗

megatron支持的优化会在ms-swift3.0重构后进行. 大概1-2个月后

@TengboWang
Copy link

非常感谢贵组的辛苦工作!
针对第三点,请问是否有想法加入类似 channel loss 的观察功能呢?即针对不同下游任务的数据集单独观察loss变化趋势。

@Jintao-Huang
Copy link
Collaborator Author

非常感谢贵组的辛苦工作! 针对第三点,请问是否有想法加入类似 channel loss 的观察功能呢?即针对不同下游任务的数据集单独观察loss变化趋势。

好的 这个需求会加,是很常见的需求
感谢两位

@satheeshKOLA532
Copy link

please include the code for end to end fine tuning / pre training for audio lanugage models into your existing pipeline.If possible please integrate the moshi audio language model also
EX: Llama 3.1 omni,

@Jintao-Huang
Copy link
Collaborator Author

channel loss: related issue: #2220

@liujiachang
Copy link

能否构建一个官方的npu版本的swfit镜像

@verigle
Copy link

verigle commented Nov 4, 2024

多模态模型是否可以支持显存均匀分布到多卡

@Jintao-Huang
Copy link
Collaborator Author

多模态模型是否可以支持显存均匀分布到多卡

deepspeed zero2/zero3是均匀的哇

@Charimanhua
Copy link

请问在微调got-ocr2.0后,使用保存的微调模型时,报错model_type的问题,该如何解决?
image
image

@Jintao-Huang
Copy link
Collaborator Author

请问在微调got-ocr2.0后,使用保存的微调模型时,报错model_type的问题,该如何解决?

你需要merge lora. 才会有config.json文件

@Charimanhua
Copy link

请问在微调got-ocr2.0后,使用保存的微调模型时,报错model_type的问题,该如何解决?

你需要merge lora. 才会有config.json文件

已解决,谢谢!
cd 进微调模型目录下,执行:
swift merge-lora --ckpt_dir xxx

@Betty-J
Copy link

Betty-J commented Nov 11, 2024

您好,当前执行多模态大模型微调时,设置的多轮对话在执行 infer 后保存的.jsonl文件中,response 只包含了最后一轮对话的结果,而 history 中包含的历史信息是label的,后续可以支持 infer 后保存多轮对话的全部结果吗

@Ash-one
Copy link

Ash-one commented Nov 29, 2024

期待早日加入音频大模型的训练和微调,cosyvoice一类的

@shiningliang
Copy link

hello,训练流程会有tp、pp...的支持吗

megatron支持的优化会在ms-swift3.0重构后进行. 大概1-2个月后

今年有希望支持吗 😂

@Jintao-Huang
Copy link
Collaborator Author

#2030

@Nevermore2099
Copy link

我使用pip安装的默认版本为2.6.1,指定版本安装也无法找到3.0版本,而源码master分支也为2.6.1,请问如何安装3.0版本ms-swift

@Fanxhion
Copy link

如何download3.0版本的swift包,现在download的包的版本是2.6.1的

@Jintao-Huang
Copy link
Collaborator Author

pip install git+https://github.com/modelscope/ms-swift.git

@Jintao-Huang Jintao-Huang changed the title ms-swift==3.0 Suggestion Box ms-swift3 Suggestion Box Dec 23, 2024
@Xu-Chen
Copy link

Xu-Chen commented Dec 23, 2024

hello,训练流程会有tp、pp...的支持吗

megatron支持的优化会在ms-swift3.0重构后进行. 大概1-2个月后

今年有希望支持吗 😂

@Jintao-Huang 今年还有机会吗?

@firefighter-eric
Copy link

firefighter-eric commented Dec 26, 2024

可以在wandb日志中加入total batch size吗
total_batch_size = batch_size_per_device * gradient_accumulation_steps * nproc_per_node * nnodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests