shape error on inference using ckpt #1

rkdckddnjs9 · 2024-11-21T05:55:27Z

Thanks you for your excellent works!!

I am trying to use the provided checkpoint for inference, but I encountered a size mismatch error during execution.
The error log indicates the following:
RuntimeError: Given normalized_shape=[320], but got input of size [12, 350, 1280]

I have ensured that my environment and setup match the requirements mentioned in the repository. However, I wanted to confirm whether there might be any differences in the provided checkpoint or configuration settings that could cause this issue.

Could you please help me identify the cause of this error or guide me on how to resolve it?
I suspect that the issue might be related to the config.json file of the UNet model

Thank you for your time and assistance!

Len-Li · 2024-11-21T11:32:22Z

Hi,

Thanks for reaching out.

Which version of the stable diffusion are you using? It seems like the unet is not correctly loading.

rkdckddnjs9 · 2024-11-21T12:05:30Z

I used stabilityai/stable-diffusion-2-1

Len-Li · 2024-11-21T14:35:25Z

Where did you encounter this issue, within ControlNet or Diffusion U-Net? Which line of code?

rkdckddnjs9 · 2024-11-21T14:53:16Z

The error occurs in the for loop at line 478 of unet_2d_condition_multiview.py.

I think that the block_out_channels value in ckp/train/syntheocc/unet/config.json might be incorrect.

Additionally, in the BasicMultiviewTransformerBlock class in models/block.py, the parameters dim, num_attention_heads, and attention_head_dim were not present in ckp/train/syntheocc/unet/config.json. To address this, I manually assigned the values 320, 5, and 64, respectively, based on the computed values in the code. Could this modification be causing the issue?

The error (RuntimeError: Given normalized_shape=[320], but got input of size [12, 350, 1280]) occurs when the block_out_channels in ckp/train/syntheocc/unet/config.json is changed from [320, 640, 1280, 1280] to [320, 320, 1280, 1280]. However, even with the original value [320, 640, 1280, 1280], the same error still occurs at the same location, but with the following details:
RuntimeError: Given normalized_shape=[320], expected input with shape [*, 320], but got input of size [12, 1400, 640]

Could you confirm if this is related to the configuration or if there’s an issue with how these parameters are being processed?

Lastly, the issue might be related to my environment setup. If you are using Docker, would it be possible to provide the Docker image?

Len-Li · 2024-11-21T15:12:17Z

I am not sure yet. I will re-run my code tomorrow to check it out.

May I ask which version of diffusers are you using? Can you successfully loading the original sd2.1 to inference?

I am not using docker. So it may be difficult for me to share the docker image.

Basically, I think this problem can be attributed to some unknown misconfiguration. I will find the reason this week.

rkdckddnjs9 · 2024-11-21T17:00:51Z

Thank you for checking.
The version of my diffusers is 0.26.0.
When loading stable-diffusion for inference, some layers encounter size mismatches, but the loading process completes without any issues.

Len-Li · 2024-11-22T07:56:07Z

Hi,

The size mismatches are unusual. Perhaps the model configuration file or the diffusers source code has been uncorrectly modified?

What is the quality of the generated image? Is it a normal image or a random one?

rkdckddnjs9 · 2024-11-22T08:08:02Z

I made two modifications:

In infer.py, I added low_cpu_mem_usage=False and ignore_mismatched_sizes=True to UNet2DConditionModelMultiview.from_pretrained.
In models/blocks.py, I added dim=320, num_attention_heads=5, and attention_head_dim=64 to the init method of BasicMultiviewTransformerBlock.

For modification 1, the checkpoint could not be loaded without this change.
For modification 2, the parameters (dim, num_attention_heads, and attention_head_dim) were not present in the config.json, so the model could not be loaded without manually specifying these values.

Despite these adjustments, I encountered an error in the for loop at line 478 of unet_2d_condition_multiview.py, which prevented the generation of any images.

Len-Li · 2024-11-22T09:23:35Z

Hi

I‘ve re-run my code. I do not encounter size mismatch error. The dimension of 320 is configured in unet/config.json. I think we do not need to tune it.
This problem is weird. I am still checking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shape error on inference using ckpt #1

shape error on inference using ckpt #1

rkdckddnjs9 commented Nov 21, 2024 •

edited

Loading

Len-Li commented Nov 21, 2024

rkdckddnjs9 commented Nov 21, 2024

Len-Li commented Nov 21, 2024

rkdckddnjs9 commented Nov 21, 2024

Len-Li commented Nov 21, 2024

rkdckddnjs9 commented Nov 21, 2024

Len-Li commented Nov 22, 2024

rkdckddnjs9 commented Nov 22, 2024 •

edited

Loading

Len-Li commented Nov 22, 2024

shape error on inference using ckpt #1

shape error on inference using ckpt #1

Comments

rkdckddnjs9 commented Nov 21, 2024 • edited Loading

Len-Li commented Nov 21, 2024

rkdckddnjs9 commented Nov 21, 2024

Len-Li commented Nov 21, 2024

rkdckddnjs9 commented Nov 21, 2024

Len-Li commented Nov 21, 2024

rkdckddnjs9 commented Nov 21, 2024

Len-Li commented Nov 22, 2024

rkdckddnjs9 commented Nov 22, 2024 • edited Loading

Len-Li commented Nov 22, 2024

rkdckddnjs9 commented Nov 21, 2024 •

edited

Loading

rkdckddnjs9 commented Nov 22, 2024 •

edited

Loading