[GLM-Image] New Models Support #12921

zRzRzRzRzRzRzR · 2026-01-07T11:01:09Z

@yiyixuxu @sayakpaul For check with model

src/diffusers/pipelines/glm_image/pipeline_glm_image.py

src/diffusers/models/transformers/transformer_glm_image.py

(cherry picked from commit cfe19a3)

…into cogview

sayakpaul

Looking quite good!

I think all the precomputations are in place and the use of caching also reads quite simple.

docs/source/en/api/pipelines/glm_image.md

src/diffusers/models/embeddings.py

src/diffusers/models/transformers/transformer_glm_image.py

src/diffusers/pipelines/glm_image/pipeline_glm_image.py

refactor attention processor to use dispatching function

yiyixuxu

thanks, looking great and super excited about this model
I left some comments, mostly, I'm a bit confused on the correct logic to set height/width

src/diffusers/pipelines/glm_image/pipeline_glm_image.py

yiyixuxu · 2026-01-08T23:42:35Z

src/diffusers/pipelines/glm_image/pipeline_glm_image.py

+        prior_token_id, prior_token_image_ids, ar_height, ar_width = self.generate_prior_tokens(
+            prompt=prompt[0] if isinstance(prompt, list) else prompt,
+            image=image,
+            height=height,
+            width=width,
+        )


a few things here:

generate_prior_tokens will error out if height = None and width = None

ar_height/ar_width is pretty straightfoward to calculate, let's calculate them seperately for clarity

we can update generate_prior_tokens to only return two tokens, this way it is easier for user to skip this stage reusing pre-computed tokens

here is just a suggestion, I'm not completely ure the logic to assign defaut height/width are correct

Suggested change

prior_token_id, prior_token_image_ids, ar_height, ar_width = self.generate_prior_tokens(

prompt=prompt[0] if isinstance(prompt, list) else prompt,

image=image,

height=height,

width=width,

)

height = height or self.default_sample_size * self.vae_scale_factor

width = width or self.default_sample_size * self.vae_scale_factor

height = (height // 32) * 32

width = (width //32) * 32

prior_token_id, prior_token_image_ids = self.generate_prior_tokens(

prompt=prompt[0] if isinstance(prompt, list) else prompt,

image=image,

height=height,

width=width,

)

Let me add a check to ensure that height and width cannot be None. This is a strict requirement, as these two parameters must be present for the AR model to correctly output tokens

src/diffusers/pipelines/glm_image/pipeline_glm_image.py

yiyixuxu · 2026-01-09T01:17:36Z

src/diffusers/pipelines/glm_image/pipeline_glm_image.py

+                f"`callback_on_step_end_tensor_inputs` has to be in {self._callback_tensor_inputs}, but found {[k for k in callback_on_step_end_tensor_inputs if k not in self._callback_tensor_inputs]}"
+            )
+
+        if prompt is not None and prompt_embeds is not None:


so, the current code structure won't work withprompt=None, when the prompt_embeds is passed, - we still need prompt generate tokens using the AR model

I think we'd need to accept both prior_token_id and prompt_embeds as inputs if prompt is None. so something like

if prompt is None: if prior_token_id is None or prompt_embeds is None: raise ValueError( "When `prompt` is not provided, both `prior_token_id` and `prompt_embeds` must be passed." )

you also need to add the prior_token_id to pipeline input

prior_token_id implementation must be generated by AR so prompt must not be none

let me change logic of this

src/diffusers/models/transformers/transformer_glm_image.py

yhyang201 · 2026-01-10T12:55:51Z

src/diffusers/models/transformers/transformer_glm_image.py

+            self.k_cache = k
+            self.v_cache = v
+        else:
+            self.k_cache = torch.cat([self.k_cache, k], dim=2)


Referring to L253, should dim be equal to 1 here?

yiyixuxu · 2026-01-10T18:59:48Z

src/diffusers/models/transformers/transformer_glm_image.py

+            self.k_cache = torch.cat([self.k_cache, k], dim=2)
+            self.v_cache = torch.cat([self.v_cache, v], dim=2)
+
+    def get(self):


not sure if it should be 1 or 2 but they should be same

probably better move the logic together like this so mistakes like https://github.com/huggingface/diffusers/pull/12921/files#r2678634789 is less likely to happen

Suggested change

def get(self):

def get(self, k: torch.Tensor, v: torch.Tensor):

k_cache = torch.cat([self.k_cache, key], dim=2)

v_cache = torch.cat([self.v_cache, key], dim=2)

yiyixuxu · 2026-01-10T19:02:40Z

src/diffusers/models/transformers/transformer_glm_image.py

+                k_cache, v_cache = kv_cache.get()
+                key = torch.cat([k_cache, key], dim=1) if k_cache is not None else key
+                value = torch.cat([v_cache, value], dim=1) if v_cache is not None else value


Suggested change

k_cache, v_cache = kv_cache.get()

key = torch.cat([k_cache, key], dim=1) if k_cache is not None else key

value = torch.cat([v_cache, value], dim=1) if v_cache is not None else value

key, value = kv_cache.get(key, value) if kv_cache is not None

yiyixuxu · 2026-01-10T19:07:30Z

src/diffusers/pipelines/glm_image/pipeline_glm_image.py

+        num_images_per_prompt: int = 1,
+        generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
+        latents: Optional[torch.FloatTensor] = None,
+        prompt_embeds: Optional[torch.FloatTensor] = None,


Suggested change

prompt_embeds: Optional[torch.FloatTensor] = None,

prompt_embeds: Optional[torch.Tensor] = None,

negative_prompt_embeds: Optional[torch.Tensor] = None,

prior_token_ids: Optional[torch.Tensor] = None,

prior_image_token_ids: Optional[torch.Tensor] = None

we should allow them to pre-compute the tokens since it is the most compute expensive part
we should allow them to pass pre-compute negative_prompt_embeds too because it is fixed

yiyixuxu · 2026-01-10T19:07:48Z

src/diffusers/pipelines/glm_image/pipeline_glm_image.py

+
+        device = self._execution_device
+
+        prior_token_id, prior_token_image_ids = self.generate_prior_tokens(


Suggested change

prior_token_id, prior_token_image_ids = self.generate_prior_tokens(

if prior_token_ids is None:

prior_token_id, prior_token_image_ids = self.generate_prior_tokens( ...)

zRzRzRzRzRzRzR added 10 commits January 7, 2026 15:31

init

ec9a82f

add

b98decf

add 1

57fd26d

Update __init__.py

bcc9c30

rename

e13fb76

2

adcc532

update

ec678a1

init with encoder

22fe6c9

merge2pipeline

b3d1b55

Merge branch 'huggingface:main' into cogview

acd13d8

zRzRzRzRzRzRzR marked this pull request as draft January 7, 2026 11:01

zRzRzRzRzRzRzR added 3 commits January 7, 2026 22:55

Update pipeline_glm_image.py

e2b31f8

remove sop

1cf277d

remove useless func

170d0ba

yiyixuxu reviewed Jan 7, 2026

View reviewed changes

src/diffusers/pipelines/glm_image/pipeline_glm_image.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Jan 8, 2026

View reviewed changes

src/diffusers/models/transformers/transformer_glm_image.py Outdated Show resolved Hide resolved

zRzRzRzRzRzRzR added 2 commits January 8, 2026 15:43

Update pipeline_glm_image.py

144c075

Merge branch 'main' into cogview

041ddec

zRzRzRzRzRzRzR marked this pull request as ready for review January 8, 2026 07:56

zRzRzRzRzRzRzR changed the title ~~GLM-Imge for test~~ [GLM-Imge] New Models Support Jan 8, 2026

zRzRzRzRzRzRzR changed the title ~~[GLM-Imge] New Models Support~~ [GLM-Image] New Models Support Jan 8, 2026

yiyixuxu and others added 4 commits January 8, 2026 16:18

up

86f5ce4

(cherry picked from commit cfe19a3)

Merge branch 'cogview' of https://github.com/zRzRzRzRzRzRzR/diffusers …

64f3842

…into cogview

review for work only

c65f224

Merge branch 'main' into cogview

8d80b76

zRzRzRzRzRzRzR mentioned this pull request Jan 8, 2026

[GLM-Image] AR Model Support for GLM-Image huggingface/transformers#43100

Open

sayakpaul reviewed Jan 8, 2026

View reviewed changes

zRzRzRzRzRzRzR added 3 commits January 8, 2026 17:58

change place

e70ebc0

Update pipeline_glm_image.py

762f9a3

update

5a0a9fa

sayakpaul reviewed Jan 8, 2026

View reviewed changes

src/diffusers/pipelines/glm_image/pipeline_glm_image.py Outdated Show resolved Hide resolved

sayakpaul reviewed Jan 8, 2026

View reviewed changes

src/diffusers/pipelines/glm_image/pipeline_glm_image.py Outdated Show resolved Hide resolved

Update transformer_glm_image.py

2ae574a

sayakpaul reviewed Jan 8, 2026

View reviewed changes

src/diffusers/pipelines/glm_image/pipeline_glm_image.py Show resolved Hide resolved

zRzRzRzRzRzRzR added 3 commits January 8, 2026 18:29

1

264f930

no negative_prompt for GLM-Image

e9b2c89

remove CogView4LoraLoaderMixin

e4f6549

sayakpaul requested a review from yiyixuxu January 8, 2026 10:38

sayakpaul and others added 10 commits January 8, 2026 17:29

refactor attention processor.

51f8015

update

075b6a9

fix

e2d4bda

use staticmethod

854e861

update

7862217

up

1226fcb

up

68ebb42

Merge pull request #4 from huggingface/zRzRzRzRzRzRzR-cogview

3b154cf

refactor attention processor to use dispatching function

update

40559ca

Update glm_image.md

19fc76b

yiyixuxu reviewed Jan 9, 2026

View reviewed changes

src/diffusers/models/transformers/transformer_glm_image.py Show resolved Hide resolved

sayakpaul and others added 3 commits January 9, 2026 10:55

Merge branch 'main' into cogview

2c21dad

1

d2a5146

Update pipeline_glm_image.py

6cfc83b

JaredforReal mentioned this pull request Jan 9, 2026

[Model]GLM Image vllm-project/vllm-omni#724

Draft

5 tasks

yhyang201 reviewed Jan 10, 2026

View reviewed changes

yiyixuxu reviewed Jan 10, 2026

View reviewed changes

-    def get(self):
+    def get(self, k: torch.Tensor, v: torch.Tensor):
+         k_cache = torch.cat([self.k_cache, key], dim=2)
+         v_cache = torch.cat([self.v_cache, key], dim=2)

-        prompt_embeds: Optional[torch.FloatTensor] = None,
+        prompt_embeds: Optional[torch.Tensor] = None,
+        negative_prompt_embeds: Optional[torch.Tensor] = None,
+        prior_token_ids: Optional[torch.Tensor] = None,
+        prior_image_token_ids: Optional[torch.Tensor] = None


		device = self._execution_device

		prior_token_id, prior_token_image_ids = self.generate_prior_tokens(

	prior_token_id, prior_token_image_ids = self.generate_prior_tokens(
	if prior_token_ids is None:
	prior_token_id, prior_token_image_ids = self.generate_prior_tokens( ...)

[GLM-Image] New Models Support #12921

Are you sure you want to change the base?

[GLM-Image] New Models Support #12921

Uh oh!

Conversation

zRzRzRzRzRzRzR commented Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zRzRzRzRzRzRzR Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yiyixuxu Jan 9, 2026 •

edited

Loading

zRzRzRzRzRzRzR Jan 9, 2026 •

edited

Loading