Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shape error on inference using ckpt #1

Open
rkdckddnjs9 opened this issue Nov 21, 2024 · 9 comments
Open

shape error on inference using ckpt #1

rkdckddnjs9 opened this issue Nov 21, 2024 · 9 comments

Comments

@rkdckddnjs9
Copy link

rkdckddnjs9 commented Nov 21, 2024

Thanks you for your excellent works!!

I am trying to use the provided checkpoint for inference, but I encountered a size mismatch error during execution.
The error log indicates the following:
RuntimeError: Given normalized_shape=[320], but got input of size [12, 350, 1280]

I have ensured that my environment and setup match the requirements mentioned in the repository. However, I wanted to confirm whether there might be any differences in the provided checkpoint or configuration settings that could cause this issue.

Could you please help me identify the cause of this error or guide me on how to resolve it?
I suspect that the issue might be related to the config.json file of the UNet model

Thank you for your time and assistance!

@Len-Li
Copy link
Collaborator

Len-Li commented Nov 21, 2024

Hi,

Thanks for reaching out.

Which version of the stable diffusion are you using? It seems like the unet is not correctly loading.

@rkdckddnjs9
Copy link
Author

I used stabilityai/stable-diffusion-2-1
image

@Len-Li
Copy link
Collaborator

Len-Li commented Nov 21, 2024

Where did you encounter this issue, within ControlNet or Diffusion U-Net? Which line of code?

@rkdckddnjs9
Copy link
Author

The error occurs in the for loop at line 478 of unet_2d_condition_multiview.py.

I think that the block_out_channels value in ckp/train/syntheocc/unet/config.json might be incorrect.

Additionally, in the BasicMultiviewTransformerBlock class in models/block.py, the parameters dim, num_attention_heads, and attention_head_dim were not present in ckp/train/syntheocc/unet/config.json. To address this, I manually assigned the values 320, 5, and 64, respectively, based on the computed values in the code. Could this modification be causing the issue?

The error (RuntimeError: Given normalized_shape=[320], but got input of size [12, 350, 1280]) occurs when the block_out_channels in ckp/train/syntheocc/unet/config.json is changed from [320, 640, 1280, 1280] to [320, 320, 1280, 1280]. However, even with the original value [320, 640, 1280, 1280], the same error still occurs at the same location, but with the following details:
RuntimeError: Given normalized_shape=[320], expected input with shape [*, 320], but got input of size [12, 1400, 640]

Could you confirm if this is related to the configuration or if there’s an issue with how these parameters are being processed?

Lastly, the issue might be related to my environment setup. If you are using Docker, would it be possible to provide the Docker image?

@Len-Li
Copy link
Collaborator

Len-Li commented Nov 21, 2024

I am not sure yet. I will re-run my code tomorrow to check it out.

May I ask which version of diffusers are you using? Can you successfully loading the original sd2.1 to inference?

I am not using docker. So it may be difficult for me to share the docker image.

Basically, I think this problem can be attributed to some unknown misconfiguration. I will find the reason this week.

@rkdckddnjs9
Copy link
Author

Thank you for checking.
The version of my diffusers is 0.26.0.
When loading stable-diffusion for inference, some layers encounter size mismatches, but the loading process completes without any issues.

@Len-Li
Copy link
Collaborator

Len-Li commented Nov 22, 2024

Hi,

The size mismatches are unusual. Perhaps the model configuration file or the diffusers source code has been uncorrectly modified?

What is the quality of the generated image? Is it a normal image or a random one?

@rkdckddnjs9
Copy link
Author

rkdckddnjs9 commented Nov 22, 2024

I made two modifications:

In infer.py, I added low_cpu_mem_usage=False and ignore_mismatched_sizes=True to UNet2DConditionModelMultiview.from_pretrained.
In models/blocks.py, I added dim=320, num_attention_heads=5, and attention_head_dim=64 to the init method of BasicMultiviewTransformerBlock.

For modification 1, the checkpoint could not be loaded without this change.
For modification 2, the parameters (dim, num_attention_heads, and attention_head_dim) were not present in the config.json, so the model could not be loaded without manually specifying these values.

Despite these adjustments, I encountered an error in the for loop at line 478 of unet_2d_condition_multiview.py, which prevented the generation of any images.

@Len-Li
Copy link
Collaborator

Len-Li commented Nov 22, 2024

Hi

I‘ve re-run my code. I do not encounter size mismatch error. The dimension of 320 is configured in unet/config.json. I think we do not need to tune it.
This problem is weird. I am still checking.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants