Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如果对Qwen2-vl进行DPO,对混合数据(部分数据有图片,部分数据无图片)进行训练,请问有实现这一部分吗? #6518

Closed
1 task done
miyapeng opened this issue Jan 3, 2025 · 3 comments
Labels
solved This problem has been already solved

Comments

@miyapeng
Copy link

miyapeng commented Jan 3, 2025

Reminder

  • I have read the README and searched the existing issues.

System Info

Traceback (most recent call last):
File "/home/myp/anaconda3/envs/llama-factory/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/home/myp/anaconda3/envs/llama-factory/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "/home/myp/anaconda3/envs/llama-factory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3458, in _map_single
batch = apply_function_on_filtered_inputs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/myp/anaconda3/envs/llama-factory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3320, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/myp/LLaMA-Factory/src/llamafactory/data/processors/pairwise.py", line 85, in preprocess_pairwise_dataset
chosen_input_ids, chosen_labels, rejected_input_ids, rejected_labels = _encode_pairwise_example(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/myp/LLaMA-Factory/src/llamafactory/data/processors/pairwise.py", line 46, in _encode_pairwise_example
chosen_messages = template.mm_plugin.process_messages(prompt + [response[0]], images, videos, processor)
File "/home/myp/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 595, in process_messages
mm_inputs = self._get_mm_inputs(images, videos, processor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/myp/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 174, in _get_mm_inputs
images = self._regularize_images(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/myp/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 114, in _regularize_images
image = Image.open(image)
^^^^^^^^^^^^^^^^^
File "/home/myp/anaconda3/envs/llama-factory/lib/python3.11/site-packages/PIL/Image.py", line 3431, in open
fp = builtins.open(filename, "rb")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: '/home/myp/LLaMA-Factory'
"""

Reproduction

训练模型:qwen2-vl
训练方式:dpo
数据形式:混合数据,部分有图,部分无图,形式如下
{
“prompt”:
"chosen":
"rejected":
"image": "llava/......"
},
{
“prompt”:
"chosen":
"rejected":
"image": "" 这里是空值
}

Expected behavior

希望可以混合训练有图和非图数据,不知道是不是没有实现这个功能,如果要自己改的话,改那一部分呢?

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 3, 2025
@hiyouga
Copy link
Owner

hiyouga commented Jan 3, 2025

支持

@hiyouga hiyouga closed this as completed Jan 3, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 3, 2025
@miyapeng
Copy link
Author

miyapeng commented Jan 3, 2025

@hiyouga
请问在代码哪里可以看到呢,或者组织的形式是否有规定?

@hiyouga
Copy link
Owner

hiyouga commented Jan 3, 2025

看 examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants