-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support finetuning LLaVA 1.6 #432
Comments
@choyakawa , HI! Thank you for your attention. The training script for LLaVA1.6 (Next) has not been released yet. We will try to follow up once it is released. |
Hi @choyakawa @hhaAndroid is working on it. Please subscribe #460! |
Failed on
|
zero2 is ok, but replicating LLaVA 1.6 with 34B model is challenging without zero3 |
@LZHgrla Do you have any idea on the failure of zero3? I am having no idea why the image features from clip has shape torch.Size([0]) here. |
@choyakawa The features of LLaVA 1.6 are still WIP. If you have any advanced attempts (such as application of 34B LLM), you are welcome to provide detailed configs and executable commands. We will conduct some tests after development to improve the robustness. |
I am not using quantization, the above failure was on bf16. And I have also tried open_clip instead of openai vit-L, not working. |
这个报错信息,该怎么解决呢? RuntimeError: The expanded size of the tensor (4096) must match the existing size (0) at non-singleton dimension 0. Target sizes: [4096, 32, 1]. Tensor sizes: [0, 1, 1]
model = self.train_loop.run() # type: ignore
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py", line 270, in run
self.runner.call_hook('before_train')
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/_flexible_runner.py", line 1271, in call_hook
getattr(hook, fn_name)(self, **kwargs)
File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 221, in before_train
self._generate_samples(runner, max_new_tokens=50)
File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 207, in _generate_samples
self._eval_images(runner, model, device, max_new_tokens,
File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/anyshape_evaluate_chat_hook.py", line 53, in _eval_images
image_features = model.preprocess_for_pixel_values({
File "/export/App/training_platform/PinoModel/xtuner/xtuner/model/anyshape_llava.py", line 109, in preprocess_for_pixel_values
self.image_newline[:, None, None].expand(
RuntimeError: The expanded size of the tensor (4096) must match the existing size (0) at non-singleton dimension 0. Target sizes: [4096, 32, 1]. Tensor sizes: [0, 1, 1]
[2024-04-24 13:52:01,685] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2444983) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================ |
Support finetuning LLaVA 1.6
The text was updated successfully, but these errors were encountered: