Bugfix for flux2 img2img2 prediction #12855

leisuzz · 2025-12-18T03:13:34Z

What does this PR do?

I got the error:
raise ValueError(f"Expected image_latents to be a list, got {type(image_latents)}.")
(1) cond_model_input_list will go to "_prepare_image_ids" in a list of [[1, cond_model_input[0], cond_model_input[1], cond_model_input[2]], ...]
(2) As the "_prepare_image_ids" in pipeline will do the torch.cat(image_latent_ids, dim=0), this will cause mismatch of shape in the training step in code model_input_ids = torch.cat([model_input_ids, cond_model_input_ids], dim=1). cond_model_input_ids .shape[0] is 1, but model_input_ids.shape[0] is the batch size. The code cond_model_input_ids.view is to resize the shape to meet the requirement
So this change will also work if batch size is more than 1.
When I only changed the cond_model_input to list, I got the training abnormal training loss (start with ~1.7, which is too high). So I fix model prediction based on the pipeline part, and loss becomes reasonable (start with ~0.4).

With the code:

orig_inp_shape = packed_noisy_model_input.shape
orig_inp_ids_shape = model_input_ids.shape
model_pred = model_pred[:, : orig_inp_shape[1], :]
model_input_ids = model_input_ids[:, : orig_inp_ids_shape[1], :]

The training loss is:
Steps:   0%|          | 0/3500 [00:00<?, ?it/s]
Steps:   0%|          | 1/3500 [00:08<8:28:06,  8.71s/it]
Steps:   0%|          | 1/3500 [00:13<8:28:06,  8.71s/it, loss=0.328, lr=1e-5]
Steps:   0%|          | 2/3500 [00:21<10:38:52, 10.96s/it, loss=0.328, lr=1e-5]
Steps:   0%|          | 2/3500 [00:26<10:38:52, 10.96s/it, loss=0.835, lr=1e-5]
Steps:   0%|          | 3/3500 [00:34<11:30:01, 11.84s/it, loss=0.835, lr=1e-5]
Steps:   0%|          | 3/3500 [00:39<11:30:01, 11.84s/it, loss=0.254, lr=1e-5]
Steps:   0%|          | 4/3500 [00:46<11:52:09, 12.22s/it, loss=0.254, lr=1e-5]
Steps:   0%|          | 4/3500 [00:52<11:52:09, 12.22s/it, loss=0.405, lr=1e-5]
Steps:   0%|          | 5/3500 [00:59<12:05:53, 12.46s/it, loss=0.405, lr=1e-5]
Steps:   0%|          | 5/3500 [01:05<12:05:53, 12.46s/it, loss=1.03, lr=1e-5] 
Steps:   0%|          | 6/3500 [01:12<12:15:03, 12.62s/it, loss=1.03, lr=1e-5]
Steps:   0%|          | 6/3500 [01:18<12:15:03, 12.62s/it, loss=0.574, lr=1e-5]
Steps:   0%|          | 7/3500 [01:25<12:20:52, 12.73s/it, loss=0.574, lr=1e-5]
Steps:   0%|          | 7/3500 [01:31<12:20:52, 12.73s/it, loss=0.29, lr=1e-5] 
Steps:   0%|          | 8/3500 [01:38<12:24:32, 12.79s/it, loss=0.29, lr=1e-5]
Steps:   0%|          | 8/3500 [01:44<12:24:32, 12.79s/it, loss=0.393, lr=1e-5]
Steps:   0%|          | 9/3500 [01:51<12:27:38, 12.85s/it, loss=0.393, lr=1e-5]
Steps:   0%|          | 9/3500 [01:57<12:27:38, 12.85s/it, loss=0.336, lr=1e-5]

With the original code:

model_pred = model_pred[:, : packed_noisy_model_input.size(1) :]
model_pred = Flux2Pipeline._unpack_latents_with_ids(model_pred, model_input_ids)

The training loss is:

Steps:   0%|          | 1/5000 [00:46<64:57:32, 46.78s/it]
Steps:   0%|          | 1/5000 [00:46<64:57:32, 46.78s/it, loss=2.01, lr=1e-5]
Steps:   0%|          | 2/5000 [01:15<50:29:04, 36.36s/it, loss=2.01, lr=1e-5]
Steps:   0%|          | 2/5000 [01:15<50:29:04, 36.36s/it, loss=2.08, lr=1e-5]
Steps:   0%|          | 3/5000 [01:47<47:31:01, 34.23s/it, loss=2.08, lr=1e-5]
Steps:   0%|          | 3/5000 [01:47<47:31:01, 34.23s/it, loss=1.83, lr=1e-5]
Steps:   0%|          | 4/5000 [02:18<45:54:39, 33.08s/it, loss=1.83, lr=1e-5]
Steps:   0%|          | 4/5000 [02:18<45:54:39, 33.08s/it, loss=1.99, lr=1e-5]
Steps:   0%|          | 5/5000 [02:47<43:39:23, 31.46s/it, loss=1.99, lr=1e-5]
Steps:   0%|          | 5/5000 [02:47<43:39:23, 31.46s/it, loss=2.02, lr=1e-5]
Steps:   0%|          | 6/5000 [03:16<42:28:13, 30.62s/it, loss=2.02, lr=1e-5]
Steps:   0%|          | 6/5000 [03:16<42:28:13, 30.62s/it, loss=2.01, lr=1e-5]
Steps:   0%|          | 7/5000 [03:42<40:32:24, 29.23s/it, loss=2.01, lr=1e-5]
Steps:   0%|          | 7/5000 [03:42<40:32:24, 29.23s/it, loss=1.83, lr=1e-5]
Steps:   0%|          | 8/5000 [04:12<40:37:29, 29.30s/it, loss=1.83, lr=1e-5]
Steps:   0%|          | 8/5000 [04:12<40:37:29, 29.30s/it, loss=1.92, lr=1e-5]

Co-authored-by: @tcaimm

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Co-authored-by: @tcaimm

leisuzz · 2025-12-18T03:30:24Z

@sayakpaul Please take a look at this PR. Thank you for your help!

sayakpaul · 2025-12-18T03:32:02Z

Do you have a reproducer?

leisuzz · 2025-12-18T06:40:40Z

@sayakpaul I've updated the result in the description, thanks :)

leisuzz · 2026-01-05T02:37:15Z

@linoytsaban Please take a look at this PR. Thank you for your help!

tcaimm · 2026-01-11T06:52:53Z

I noticed the anomaly in the loss statement a while ago. The main issue is that the img2img training logic requires concatenating the condition along the token dimension, and then pruning the condition information when calculating the loss. Flux2 operates on both model_input and model_input_ids. The initial size needs to be recorded before input, similar to the operation in train_dreambooth_lora_flux_kontext.py. The img2img training seems to be a simple copy and modification of txt2img, forgetting to prune the output. The modifications after line 1727 are as follows:

# concatenate the model inputs with the cond inputs
orig_inp_shape = packed_noisy_model_input.shape
orig_inp_ids_shape = model_input_ids.shape
packed_noisy_model_input = torch.cat([packed_noisy_model_input, packed_cond_model_input], dim=1)
model_input_ids = torch.cat([model_input_ids, cond_model_input_ids], dim=1)

# handle guidance
guidance = torch.full([1], args.guidance_scale, device=accelerator.device)
guidance = guidance.expand(model_input.shape[0])

# Predict the noise residual
model_pred = transformer(
...

model_pred = model_pred[:, : orig_inp_shape[1], :]
model_input_ids = model_input_ids[:, : orig_inp_ids_shape[1], :]
model_pred = Flux2Pipeline._unpack_latents_with_ids(model_pred, model_input_ids)

This is the core problem with this training script; please fix it as soon as possible.
@sayakpaul,@linoytsaban Thank you for your help.

sayakpaul · 2026-01-11T14:24:38Z

@tcaimm thanks for pointing that out. Since you have already characterized the bug and proposed a solution would you like to open a PR? That way, your contribution will stay within the library :-)

leisuzz · 2026-01-12T01:23:33Z

@sayakpaul probably I can add @tcaimm as the co-auther for this PR after I change the line:
model_input_ids = model_input_ids[:, :noisy_len:]
And the lines:

cond_model_input_list = [cond_model_input[i].unsqueeze(0) for i in range(cond_model_input.shape[0])]
cond_model_input_ids = Flux2Pipeline._prepare_image_ids(cond_model_input_list).to(
                    device=cond_model_input.device
                )
cond_model_input_ids = cond_model_input_ids.view(
                    cond_model_input.shape[0], -1, model_input_ids.shape[-1]
                )

are still needed.

sayakpaul · 2026-01-12T03:11:26Z

Sure that works.

leisuzz · 2026-01-12T08:56:55Z

@sayakpaul @tcaimm Please take a look

tcaimm · 2026-01-12T10:16:36Z

@sayakpaul @tcaimm请看一下

Thanks for the update! I’ve taken a look at the changes, and they look great to me.

Since we collaborated on this, would you mind adding me as a co-author in the final squash/merge commit? This helps GitHub track the contribution correctly. You can add this line to the bottom of the commit message:

Co-authored-by: tcaimm [email protected]

Looking forward to seeing this merged!

Co-authored-by: tcaimm <[email protected]>

leisuzz · 2026-01-12T11:36:16Z

@sayakpaul @tcaimm请看一下

Thanks for the update! I’ve taken a look at the changes, and they look great to me.

Since we collaborated on this, would you mind adding me as a co-author in the final squash/merge commit? This helps GitHub track the contribution correctly. You can add this line to the bottom of the commit message:

Co-authored-by: tcaimm [email protected]

Looking forward to seeing this merged!

Done!

HuggingFaceDocBuilderDev · 2026-01-12T13:41:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

linoytsaban

Thanks a lot @leisuzz!

sayakpaul · 2026-01-12T14:37:12Z

Thanks for the awesome contributions!

Bugfix for dreambooth flux2 img2img2

e1cf6a3

leisuzz changed the title ~~Bugfix for dreambooth flux2 img2img2~~ Bugfix for flux2 img2img2 prediction Dec 18, 2025

leisuzz added 2 commits December 18, 2025 11:24

Bugfix for dreambooth flux2 img2img2

e3556db

Bugfix for dreambooth flux2 img2img2

79f704c

sayakpaul requested a review from linoytsaban December 18, 2025 03:32

leisuzz added 2 commits December 18, 2025 16:48

Bugfix for dreambooth flux2 img2img2

e5229da

Bugfix for dreambooth flux2 img2img2

86da067

leisuzz force-pushed the m2m branch from 0b0ebaf to fd7d937 Compare January 12, 2026 08:47

Bugfix for dreambooth flux2 img2img2

be5985b

Co-authored-by: tcaimm <[email protected]>

leisuzz force-pushed the m2m branch from 41c9db3 to be5985b Compare January 12, 2026 11:35

Merge branch 'main' into m2m

abee6cf

linoytsaban approved these changes Jan 12, 2026

View reviewed changes

sayakpaul merged commit 29a930a into huggingface:main Jan 12, 2026
24 of 26 checks passed

Bugfix for flux2 img2img2 prediction #12855

Bugfix for flux2 img2img2 prediction #12855

Conversation

leisuzz commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

leisuzz commented Dec 18, 2025

Uh oh!

sayakpaul commented Dec 18, 2025

Uh oh!

leisuzz commented Dec 18, 2025

Uh oh!

leisuzz commented Jan 5, 2026

Uh oh!

tcaimm commented Jan 11, 2026

Uh oh!

sayakpaul commented Jan 11, 2026

Uh oh!

leisuzz commented Jan 12, 2026

Uh oh!

sayakpaul commented Jan 12, 2026

Uh oh!

leisuzz commented Jan 12, 2026

Uh oh!

tcaimm commented Jan 12, 2026

Uh oh!

leisuzz commented Jan 12, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 12, 2026

Uh oh!

linoytsaban left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leisuzz commented Dec 18, 2025 •

edited

Loading