[GLM-Image] AR Model Support for GLM-Image #43100

zRzRzRzRzRzRzR · 2026-01-04T13:00:09Z

This PR is to adapt the implementation of the AR model for GLM-Image. For Full Pipeline, check diffusers repos

ArthurZucker

thanks a lot for the great model and your hard work!
Appart from making sure we don't always compute the loss, LGTM!

src/transformers/models/glm4v/modular_glm4v.py

ArthurZucker · 2026-01-09T12:50:58Z

src/transformers/models/glm_image/modular_glm_image.py

+            # Other implementations: Process each chunk separately
+            lengths = cu_seqlens[1:] - cu_seqlens[:-1]
+            splits = [
+                torch.split(tensor, lengths.tolist(), dim=2) for tensor in (query_states, key_states, value_states)
+            ]
+
+            attn_outputs = [
+                attention_interface(
+                    self,
+                    q,
+                    k,
+                    v,
+                    attention_mask=None,
+                    scaling=self.scaling,
+                    dropout=0.0 if not self.training else self.attention_dropout,
+                    is_causal=False,
+                    **kwargs,
+                )[0]
+                for q, k, v in zip(*splits)
+            ]
+            attn_output = torch.cat(attn_outputs, dim=1)


yep @zucchini-nlp started an internal thread on how to properly do this. I think the best for tthis model as it is rushed is to keep it as is but let's work on sometthing better for the nextt models !

src/transformers/models/glm_image/modular_glm_image.py

ArthurZucker · 2026-01-09T13:04:19Z

src/transformers/models/glm_image/modular_glm_image.py

+        def _expand_dict_for_generation_visual(dict_to_expand):
+            image_grid_thw = model_kwargs.get("image_grid_thw", None)
+            image_nums = self._get_image_nums(input_ids)
+
+            def _repeat_interleave_samples(x, lengths, repeat_times):
+                samples = torch.split(x, lengths)
+                repeat_args = [repeat_times] + [1] * (x.dim() - 1)
+                result = torch.cat([sample.repeat(*repeat_args) for sample in samples], dim=0)
+                return result
+
+            for key in dict_to_expand:
+                if key == "pixel_values":
+                    # split images into samples
+                    samples = torch.split(image_grid_thw[: sum(image_nums)], list(image_nums))
+                    # compute the sequence length of images for each sample
+                    lengths = [torch.prod(sample, dim=1).sum() for sample in samples]
+                    dict_to_expand[key] = _repeat_interleave_samples(
+                        dict_to_expand[key], lengths=lengths, repeat_times=expand_size
+                    )
+                elif key == "image_grid_thw":
+                    # get the num of images for each sample and +1 for the image being generated
+                    lengths = list(image_nums)
+                    last_image = dict_to_expand[key][:-1]
+                    dict_to_expand[key] = _repeat_interleave_samples(
+                        dict_to_expand[key][: sum(image_nums)], lengths=lengths, repeat_times=expand_size
+                    )
+                    dict_to_expand[key] = torch.cat([dict_to_expand[key], last_image], dim=0)
+            return dict_to_expand
+
+        def _expand_dict_for_generation(dict_to_expand):
+            for key in dict_to_expand:
+                if (
+                    key != "cache_position"
+                    and dict_to_expand[key] is not None
+                    and isinstance(dict_to_expand[key], torch.Tensor)
+                    and key not in visual_keys
+                ):
+                    dict_to_expand[key] = dict_to_expand[key].repeat_interleave(expand_size, dim=0)
+            return dict_to_expand


let's put nesting outside as much as possible please!

ArthurZucker · 2026-01-09T13:04:52Z

src/transformers/models/glm_image/modular_glm_image.py

+                if key == "pixel_values":
+                    # split images into samples
+                    samples = torch.split(image_grid_thw[: sum(image_nums)], list(image_nums))
+                    # compute the sequence length of images for each sample
+                    lengths = [torch.prod(sample, dim=1).sum() for sample in samples]
+                    dict_to_expand[key] = _repeat_interleave_samples(
+                        dict_to_expand[key], lengths=lengths, repeat_times=expand_size
+                    )
+                elif key == "image_grid_thw":


since there are only 2 keys being handled let's no iterate (for explicitness)

Is there any reference code here? I referenced the Qwen2 implementation

… cogview

zucchini-nlp

Nice, let's merge 🚀

github-actions · 2026-01-12T15:54:36Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, glm4v, glm4v_moe, glm_image

github-actions · 2026-01-12T16:03:09Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43100&sha=ef3af1

zRzRzRzRzRzRzR added 30 commits December 17, 2025 18:12

only test

11d759c

Merge branch 'huggingface:main' into cogview

10fc39e

Merge branch 'huggingface:main' into cogview

413d2f4

update

cd9956c

use mrope

8e83ee7

Merge remote-tracking branch 'upstream/main' into cogview

faaf33d

new kind of impl

e5bd08e

1

ba28d91

with vision?

a136820

draft projector

ea57064

2

e9f15a8

change vit shape

931c643

use new config

5873a98

no tie

58ada24

1

d3b4108

use video token again

d66a0ac

1

a39cf88

remove video

92a2322

Update modeling_glm_image.py

67a59cf

1

1da6998

update

cac0dc7

Update modeling_glm_image.py

52aeace

update for test working

4e1eed3

2

b4613d6

Delete modeling_siglip_tokenizer.py

724275b

1

8eceb91

Delete modeling_siglip_tokenizer.py

da0d493

draft of vq

67403d2

3

6f3c0c3

2

cff9919

zRzRzRzRzRzRzR and others added 5 commits January 9, 2026 19:30

using llama type

5ec417e

2

fa50824

Update modular_glm_image.py

4c511ba

models can't run, fix

4d86dc0

position ids, second try. Should work now

33bd7a9

ArthurZucker approved these changes Jan 9, 2026

View reviewed changes

zRzRzRzRzRzRzR and others added 18 commits January 11, 2026 00:14

Update modular_glm_image.py

90e7768

remove

f334e99

move prompt expand inside processing

b93a714

typos and tiny fixes

f2e9ff4

make it runnable with example script

bf95580

nit: let's follow standard API

c8c723b

using right

238d6db

Merge branch 'cogview' of github.com:zRzRzRzRzRzRzR/transformers into…

d55151e

… cogview

update doc

ac9cee1

update

74a467d

update

82c0530

resolution changed

fe7650d

udate

fc582db

1

9468522

Merge branch 'main' into cogview

34eae52

2

e27fd18

3

2d84676

Update check_repo.py

a137785

zucchini-nlp approved these changes Jan 12, 2026

View reviewed changes

zucchini-nlp added 2 commits January 12, 2026 16:53

skip/overwrite tests

d750318

Merge branch 'main' into cogview

ef3af15

zucchini-nlp enabled auto-merge (squash) January 12, 2026 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GLM-Image] AR Model Support for GLM-Image #43100

[GLM-Image] AR Model Support for GLM-Image #43100

zRzRzRzRzRzRzR commented Jan 4, 2026 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Uh oh!

ArthurZucker Jan 9, 2026

Uh oh!

Uh oh!

ArthurZucker Jan 9, 2026

Uh oh!

ArthurZucker Jan 9, 2026

Uh oh!

zRzRzRzRzRzRzR Jan 10, 2026

Uh oh!

zucchini-nlp left a comment

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[GLM-Image] AR Model Support for GLM-Image #43100

Are you sure you want to change the base?

[GLM-Image] AR Model Support for GLM-Image #43100

Conversation

zRzRzRzRzRzRzR commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ArthurZucker Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

zRzRzRzRzRzRzR Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zRzRzRzRzRzRzR commented Jan 4, 2026 •

edited

Loading