forked from InternLM/xtuner
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] Support LLaVA (InternLM#196)
* v1 * add load_image * update cfg image url * del fig * update * temp * update convert * update chat_mm * add exclude_frozen_parameters for deepspeed * update chat * update xtuner help msg * fix bugs * revert bf16 deepspeed * fix bugs * add visual_select_layer for chat * improve pth_to_hf * rename projecter_pth to pretrained_pth * temp * update requirements * add cfgs * update * fix pre-commit * optim chat * optim chat * Delete xtuner/model/unused.py * move dispatch to a deeper folder * add projector * update * del model/projector * fix bugs * add docs * update * update * update * update * enhance resume for map_fn * update import * add llava_internlm_chat_7b_clip_vit_large_p14 * update dispatch * update dispatch * add link * update max_length * update max_length * update hyp * align * move yi flash attn * fix pre-commit * update deepspeed requirements * add mmbench script * install openpyxl * add entry_point for mmbench * save args * update mmbench * update max_length * add llama2 qlora * update mmbench * fix mmbench bugs * use osp instead of os.path * refactor pth_to_hf * update chat and mmbench to support --llava * align to chat * update entry_point * add vicuna template * add vicuna_7b_v15 * fix pre-commit * add vicuna_7b_v1.5 qlora * skip_special_tokens for decode text * remove do_sample * add warmup * fix pre-commit * Update dataset_prepare.md * Update dataset_prepare.md * Add KEEP_STSTEM for template * remove * fix vicuna template * clean cfgs * add cfgs * fix pre-commit * add --language for mmbench * fix bugs * fix pretrain bug * support visual_encoder lora * fix bugs * add paramwise_cfg * remove print_peft_model_trainable_parameters * fix bugs * add paramwise_cfg for DeepSpeedOptimWrapper * fix engine deepspeed paramwise_cfg bug * fix encode_fn bug * fix * fix pad_image_to_square bugs * Add space for system to avoid mismatch of 'USER' token * revert to adding bos_token at each conv * revert for paramwise_cfg * better cfgs? * fix import bug * fix import bug * pretrain align * update prepare_inputs_labels_for_multimodal * 1792 * support length_grouped_samplers * 1792 * remove KEEP_SYSTEM * remove system in cfg * update 336 cfg * add torch_dtype for mmbench and chat * group 50 * quant for pretrain * update cfgs * refactor cfgs * add length for concat dataset * update requirements * fix typo * add template for internlm pretrain * no zh * remove 20b cfgs * fix pre-commit * revert invalid input * rename * Update README.md * Update README_zh-CN.md * fix pre-commit * remove llava_zh from docs * qlora 512 * rename llava map_fn * update cfgs * update model urls * add docs link * add llava docs * update docs * update urls * add citation * fix README * move * update * vicuna pretrain with prompt * rename * add results * fix pre-commit * update * update * update * update * update * update * update * update * update * update * update * update * Update README.md * Update README_zh-CN.md * Update README_zh.md * Update README_zh.md * Update README.md * Update README_zh.md * Update README.md * Update README.md * fix typo * fix * Update README.md * Update README_zh-CN.md * rename * auto cn_string * fix pre-commit * rename * remove language * add VLMEvalKit * rename VLLM to VLM * add the download links of MMBench * update * update readme * update * update * update merge * fix cfg bug * Update README.md * Update README_zh.md * update * fix * update requirements * Update runtime.txt * Update runtime.txt * Update runtime.txt * Update README.md * Update README.md * Update README_zh.md * fix pre-commit * fix * update mmbench prompt * fix bugs * fix bugs * update docs * update * update * Update README.md
- Loading branch information
Showing
57 changed files
with
4,014 additions
and
272 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# LLaVA Full Pipeline | ||
|
||
## Data Preparation | ||
|
||
Please refer to the [docs](../../../docs/en/user_guides/dataset_prepare.md#llava-dataset). | ||
|
||
## Training | ||
|
||
The training of LLaVA consists of two steps: alignment module (i.e., MLP) pretraining and instruction following fine-tuning | ||
|
||
Note: this guide takes 8-card training LLaVA-InternLM as an example, if there are insufficient GPU resources or memory during actual use, you can reduce the batchsize appropriately to decrease memory consumption. The Pretrained projector is saved and re-loaded by default in `./work_dirs/llava_internlm_chat_7b_clip_vit_large_p14_336_e1_gpu8_pretrain/epoch_1.pth`. | ||
|
||
1. Alignment module pretraining (saved by default in `./work_dirs/`) | ||
|
||
```bash | ||
NPROC_PER_NODE=8 xtuner train llava_internlm_chat_7b_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2 | ||
``` | ||
|
||
2. Instruction following fine-tuning (saved by default in `./work_dirs/`) | ||
|
||
```bash | ||
NPROC_PER_NODE=8 xtuner train llava_internlm_chat_7b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune --deepspeed deepspeed_zero2 | ||
``` | ||
|
||
## Model Convert (and Merge) | ||
|
||
After training, we will obtain a set of weights (*i.e.*, `epoch_1.pth`), which are not in the universal HuggingFace format. We first need to convert them. | ||
|
||
```bash | ||
xtuner convert pth_to_hf $FINETUNE_CFG $PTH_PATH $SAVE_PATH | ||
# e.g., xtuner convert pth_to_hf llava_internlm_chat_7b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune ./epoch_1.pth ./epoch_1_hf | ||
``` | ||
|
||
At this point, we have obtained the relevant model (LLM or the corresponding LoRA). | ||
|
||
Afterwards, if you want to merge LoRA into LLM or CLIP-ViT, please use the following command: | ||
|
||
```bash | ||
(LLM) xtuner convert merge $LLM $LLM_ADAPTER $SAVE_PATH | ||
(CLIP) xtuner convert merge $CLIP $CLIP_ADAPTER $SAVE_PATH --is-clip | ||
``` | ||
|
||
## Chat | ||
|
||
You can download the released LLaVA-InternLM-7B model from 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-internlm-7b) and 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-internlm-7b), and achieve image-text question answering with the following command! | ||
|
||
```bash | ||
xtuner chat internlm/internlm-chat-7b \ | ||
--visual-encoder openai/clip-vit-large-patch14-336 \ | ||
--llava xtuner/llava-internlm-7b \ | ||
--prompt-template internlm_chat \ | ||
--image $IMAGE_PATH | ||
``` | ||
|
||
Here, `--llava` is the converted weight from the above step (in our example, it is `./epoch_1_hf` ). | ||
|
||
## Evaluation | ||
|
||
XTuner's LLaVA models can be evaluated using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). | ||
|
||
For convenience, XTuner also integrates the [MMBench](https://mmbench.opencompass.org.cn/home) evaluation. | ||
|
||
User can download the MMBench dataset with | ||
|
||
``` | ||
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_EN.tsv | ||
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_EN.tsv | ||
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_CN.tsv | ||
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_CN.tsv | ||
wget https://opencompass.openxlab.space/utils/VLMEval/CCBench.tsv | ||
``` | ||
|
||
After that, the evaluations can be run with | ||
|
||
```bash | ||
xtuner mmbench internlm/internlm-chat-7b \ | ||
--visual-encoder openai/clip-vit-large-patch14-336 \ | ||
--llava xtuner/llava-internlm-7b \ | ||
--prompt-template internlm_chat \ | ||
--data-path $DATA_PATH \ | ||
--work-dir $RESULT_PATH | ||
``` | ||
|
||
Here, `$DATA_PATH` refers to one of the datasets downloaded as mentioned above, such as `MMBench_DEV_EN.tsv`. | ||
|
||
After the evaluation is completed, if it's a development set, it will directly print out the results; If it's a test set, you need to submit `mmbench_result.xlsx` to the official MMBench for final evaluation to obtain precision results! | ||
|
||
| Model | MMBench Test (EN) | MMBench Dev (EN) | MMBench Test (CN) | MMBench Dev (CN) | CCBench Dev | MME | MMVet | SEEDBench_IMG | Configs | Pretrained Projector Checkpoints | Fine-tuned LLaVA Checkpoints | | ||
| :------------------------- | :---------------: | :--------------: | :---------------: | :--------------: | :---------: | :--: | :---: | :-----------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------: | | ||
| LLaVA-v1.5-7B (XTuner) | 67.7 | 69.2 | 61.0 | 59.7 | 27.6 | 1702 | 66.4 | 32.3 | [Pretrain](./vicuna_7b_v15_clip_vit_large_p14_336/pretrain/llava_vicuna_7b_v15_clip_vit_large_p14_336_e1_gpu8_pretrain.py) / [Fine-tune](./vicuna_7b_v15_clip_vit_large_p14_336/finetune/llava_vicuna_7b_v15_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune.py) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-v1.5-7b-xtuner-pretrain) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-v1.5-7b-xtuner-pretrain) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-v1.5-7b-xtuner) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-v1.5-7b-xtuner) | | ||
| LLaVA-v1.5-13B (XTuner) | 68.9 | 69.5 | 64.7 | 63.1 | 32.2 | 1771 | 68.1 | 35.5 | [Pretrain](./vicuna_13b_v15_clip_vit_large_p14_336/pretrain/llava_vicuna_13b_v15_clip_vit_large_p14_336_e1_gpu8_pretrain.py) / [Fine-tune](./vicuna_13b_v15_clip_vit_large_p14_336/finetune/llava_vicuna_13b_v15_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune.py) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-v1.5-13b-xtuner-pretrain) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-v1.5-13b-xtuner-pretrain) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-v1.5-13b-xtuner) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-v1.5-13b-xtuner) | | ||
| LLaVA-InternLM-7B (XTuner) | 69.0 | 68.5 | 66.7 | 63.8 | 35.8 | 1671 | 65.8 | 33.8 | [Pretrain](./internlm_chat_7b_clip_vit_large_p14_336/pretrain/llava_internlm_chat_7b_clip_vit_large_p14_336_e1_gpu8_pretrain.py) / [Fine-tune](./internlm_chat_7b_clip_vit_large_p14_336/finetune/llava_internlm_chat_7b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune.py) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-internlm-7b-pretrain) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-internlm-7b-pretrain) | 🤗 [HuggingFace](https://huggingface.co/xtuner/llava-internlm-7b) / 🤖 [ModelScope](https://modelscope.cn/models/xtuner/llava-internlm-7b) | |
Oops, something went wrong.