-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
问题描述
在微调 HumanVLM 模型时,遇到了如下错误:
FileNotFoundError: can't find *_optim_states.pt files in directory '/home/chou/.cache/huggingface/hub/models--OpenFace-CQUPT--Human_LLaVA'
根据报错内容,似乎是优化器状态文件缺失,但模型下载目录中没有相关文件(如 *_optim_states.pt),导致微调无法正常开始。
以下是我的操作步骤和遇到的问题详细描述。
复现步骤
1, 克隆 HumanVLM 项目并安装依赖。
2, 准备预训练模型和数据:
- 模型下载路径:OpenFace-CQUPT/Human_LLaVA
3, 执行微调命令:
xtuner train HumanVLM/human_llama3_8b_instruct_siglip_so400m_large_p14_384_lora_e1_gpu8_finetune.py
4, 遇到上述错误。
附加信息:
- 以下是我在HumanVLM/HumanVLM/human_llama3_8b_instruct_siglip_so400m_large_p14_384_lora_e1_gpu8_finetune.py文件中的修改:
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
llm_name_or_path = 'meta-llama/Meta-Llama-3-8B-Instruct'
visual_encoder_name_or_path = 'google/siglip-so400m-patch14-384'
# Specify the pretrained pth
#pretrained_pth = './work_dirs/human_llama3_8b_instruct_siglip_so400m_large_p14_384_e1_gpu8_pretrain/iter_54000.pth' # noqa: E501
pretrained_pth = '/home/chou/.cache/huggingface/hub/models--OpenFace-CQUPT--Human_LLaVA'
# Data
#data_root = '/home/ubuntu/public-Datasets/HumanSFT/'
data_root = '/home/chou/deep/'
data_path = data_root + 'processed_from_converted_data_for_finetuning'
#data_path = data_root + 'ft_hfformat_base_attr_keypoint_0616_clean'
# data_path = data_root + 'ft_json_base_attr_keypoint_0616'
#image_folder = data_root + 'data'
image_folder = data_root + 'pt_images/train2014'
prompt_template = PROMPT_TEMPLATE.llama3_chat
max_length = int(4096 - 728)
- 完整日志如下:
01/12 22:37:29 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.85s/it]
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
Processing zero checkpoint '/home/chou/.cache/huggingface/hub/models--OpenFace-CQUPT--Human_LLaVA'
Traceback (most recent call last):
File "/home/chou/deep/HumanVLM/xtuner/xtuner/tools/train.py", line 364, in <module>
main()
File "/home/chou/deep/HumanVLM/xtuner/xtuner/tools/train.py", line 353, in main
runner = Runner.from_cfg(cfg)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/runner/runner.py", line 462, in from_cfg
runner = cls(
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/runner/runner.py", line 429, in __init__
self.model = self.build_model(model)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/runner/runner.py", line 836, in build_model
model = MODELS.build(model)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/home/chou/miniconda3/envs/humancaption/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/home/chou/deep/HumanVLM/xtuner/xtuner/model/llava.py", line 109, in __init__
pretrained_state_dict = guess_load_checkpoint(pretrained_pth)
File "/home/chou/deep/HumanVLM/xtuner/xtuner/model/utils.py", line 313, in guess_load_checkpoint
state_dict = get_state_dict_from_zero_checkpoint(
File "/home/chou/deep/HumanVLM/xtuner/xtuner/utils/zero_to_any_dtype.py", line 617, in get_state_dict_from_zero_checkpoint
return _get_state_dict_from_zero_checkpoint(ds_checkpoint_dir,
File "/home/chou/deep/HumanVLM/xtuner/xtuner/utils/zero_to_any_dtype.py", line 228, in _get_state_dict_from_zero_checkpoint
optim_files = get_optim_files(ds_checkpoint_dir)
File "/home/chou/deep/HumanVLM/xtuner/xtuner/utils/zero_to_any_dtype.py", line 103, in get_optim_files
return get_checkpoint_files(checkpoint_dir, '*_optim_states.pt')
File "/home/chou/deep/HumanVLM/xtuner/xtuner/utils/zero_to_any_dtype.py", line 96, in get_checkpoint_files
raise FileNotFoundError(
FileNotFoundError: can't find *_optim_states.pt files in directory '/home/chou/.cache/huggingface/hub/models--OpenFace-CQUPT--Human_LLaVA'
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels