Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
ec9a82f
init
zRzRzRzRzRzRzR Jan 7, 2026
b98decf
add
zRzRzRzRzRzRzR Jan 7, 2026
57fd26d
add 1
zRzRzRzRzRzRzR Jan 7, 2026
bcc9c30
Update __init__.py
zRzRzRzRzRzRzR Jan 7, 2026
e13fb76
rename
zRzRzRzRzRzRzR Jan 7, 2026
adcc532
2
zRzRzRzRzRzRzR Jan 7, 2026
ec678a1
update
zRzRzRzRzRzRzR Jan 7, 2026
22fe6c9
init with encoder
zRzRzRzRzRzRzR Jan 7, 2026
b3d1b55
merge2pipeline
zRzRzRzRzRzRzR Jan 7, 2026
acd13d8
Merge branch 'huggingface:main' into cogview
zRzRzRzRzRzRzR Jan 7, 2026
e2b31f8
Update pipeline_glm_image.py
zRzRzRzRzRzRzR Jan 7, 2026
1cf277d
remove sop
zRzRzRzRzRzRzR Jan 7, 2026
170d0ba
remove useless func
zRzRzRzRzRzRzR Jan 7, 2026
144c075
Update pipeline_glm_image.py
zRzRzRzRzRzRzR Jan 8, 2026
041ddec
Merge branch 'main' into cogview
zRzRzRzRzRzRzR Jan 8, 2026
86f5ce4
up
yiyixuxu Jan 8, 2026
64f3842
Merge branch 'cogview' of https://github.com/zRzRzRzRzRzRzR/diffusers…
zRzRzRzRzRzRzR Jan 8, 2026
c65f224
review for work only
zRzRzRzRzRzRzR Jan 8, 2026
8d80b76
Merge branch 'main' into cogview
zRzRzRzRzRzRzR Jan 8, 2026
e70ebc0
change place
zRzRzRzRzRzRzR Jan 8, 2026
762f9a3
Update pipeline_glm_image.py
zRzRzRzRzRzRzR Jan 8, 2026
5a0a9fa
update
zRzRzRzRzRzRzR Jan 8, 2026
2ae574a
Update transformer_glm_image.py
zRzRzRzRzRzRzR Jan 8, 2026
264f930
1
zRzRzRzRzRzRzR Jan 8, 2026
e9b2c89
no negative_prompt for GLM-Image
zRzRzRzRzRzRzR Jan 8, 2026
e4f6549
remove CogView4LoraLoaderMixin
zRzRzRzRzRzRzR Jan 8, 2026
51f8015
refactor attention processor.
sayakpaul Jan 8, 2026
075b6a9
update
zRzRzRzRzRzRzR Jan 8, 2026
e2d4bda
fix
sayakpaul Jan 8, 2026
854e861
use staticmethod
zRzRzRzRzRzRzR Jan 8, 2026
7862217
update
zRzRzRzRzRzRzR Jan 8, 2026
1226fcb
up
sayakpaul Jan 8, 2026
68ebb42
up
sayakpaul Jan 8, 2026
3b154cf
Merge pull request #4 from huggingface/zRzRzRzRzRzRzR-cogview
zRzRzRzRzRzRzR Jan 8, 2026
40559ca
update
zRzRzRzRzRzRzR Jan 8, 2026
19fc76b
Update glm_image.md
zRzRzRzRzRzRzR Jan 8, 2026
2c21dad
Merge branch 'main' into cogview
sayakpaul Jan 9, 2026
d2a5146
1
zRzRzRzRzRzRzR Jan 9, 2026
6cfc83b
Update pipeline_glm_image.py
zRzRzRzRzRzRzR Jan 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,8 @@
title: Flux2Transformer2DModel
- local: api/models/flux_transformer
title: FluxTransformer2DModel
- local: api/models/glm_image_transformer2d
title: GlmImageTransformer2DModel
- local: api/models/hidream_image_transformer
title: HiDreamImageTransformer2DModel
- local: api/models/hunyuan_transformer2d
Expand Down Expand Up @@ -547,6 +549,8 @@
title: Flux2
- local: api/pipelines/control_flux_inpaint
title: FluxControlInpaint
- local: api/pipelines/glm_image
title: GLM-Image
- local: api/pipelines/hidream
title: HiDream-I1
- local: api/pipelines/hunyuandit
Expand Down
18 changes: 18 additions & 0 deletions docs/source/en/api/models/glm_image_transformer2d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# GlmImageTransformer2DModel

A Diffusion Transformer model for 2D data from [GlmImageTransformer2DModel]()

## GlmImageTransformer2DModel

[[autodoc]] GlmImageTransformer2DModel
95 changes: 95 additions & 0 deletions docs/source/en/api/pipelines/glm_image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-->

# GLM-Image

## Overview

GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios.

Model architecture: a hybrid autoregressive + diffusion decoder design、

+ Autoregressive generator: a 9B-parameter model initialized from [GLM-4-9B-0414](https://huggingface.co/zai-org/GLM-4-9B-0414), with an expanded vocabulary to incorporate visual tokens. The model first generates a compact encoding of approximately 256 tokens, then expands to 1K–4K tokens, corresponding to 1K–2K high-resolution image outputs. You can check AR model in class `GlmImageForConditionalGeneration` of transformers library.
+ Diffusion Decoder: a 7B-parameter decoder based on a single-stream DiT architecture for latent-space image decoding. It is equipped with a Glyph Encoder text module, significantly improving accurate text rendering within images.

Post-training with decoupled reinforcement learning: the model introduces a fine-grained, modular feedback strategy using the GRPO algorithm, substantially enhancing both semantic understanding and visual detail quality.

+ Autoregressive module: provides low-frequency feedback signals focused on aesthetics and semantic alignment, improving instruction following and artistic expressiveness.
+ Decoder module: delivers high-frequency feedback targeting detail fidelity and text accuracy, resulting in highly realistic textures, lighting, and color reproduction, as well as more precise text rendering.

GLM-Image supports both text-to-image and image-to-image generation within a single model

+ Text-to-image: generates high-detail images from textual descriptions, with particularly strong performance in information-dense scenarios.
+ Image-to-image: supports a wide range of tasks, including image editing, style transfer, multi-subject consistency, and identity-preserving generation for people and objects.

This pipeline was contributed by [zRzRzRzRzRzRzR](https://github.com/zRzRzRzRzRzRzR). The codebase can be found [here](https://huggingface.co/zai-org/GLM-Image).

## Usage examples

### Text to Image Generation

```python
import torch
from diffusers.pipelines.glm_image import GlmImagePipeline

pipe = GlmImagePipeline.from_pretrained("zai-org/GLM-Image",torch_dtype=torch.bfloat16,device_map="cuda")
prompt = "A beautifully designed modern food magazine style dessert recipe illustration, themed around a raspberry mousse cake. The overall layout is clean and bright, divided into four main areas: the top left features a bold black title 'Raspberry Mousse Cake Recipe Guide', with a soft-lit close-up photo of the finished cake on the right, showcasing a light pink cake adorned with fresh raspberries and mint leaves; the bottom left contains an ingredient list section, titled 'Ingredients' in a simple font, listing 'Flour 150g', 'Eggs 3', 'Sugar 120g', 'Raspberry puree 200g', 'Gelatin sheets 10g', 'Whipping cream 300ml', and 'Fresh raspberries', each accompanied by minimalist line icons (like a flour bag, eggs, sugar jar, etc.); the bottom right displays four equally sized step boxes, each containing high-definition macro photos and corresponding instructions, arranged from top to bottom as follows: Step 1 shows a whisk whipping white foam (with the instruction 'Whip egg whites to stiff peaks'), Step 2 shows a red-and-white mixture being folded with a spatula (with the instruction 'Gently fold in the puree and batter'), Step 3 shows pink liquid being poured into a round mold (with the instruction 'Pour into mold and chill for 4 hours'), Step 4 shows the finished cake decorated with raspberries and mint leaves (with the instruction 'Decorate with raspberries and mint'); a light brown information bar runs along the bottom edge, with icons on the left representing 'Preparation time: 30 minutes', 'Cooking time: 20 minutes', and 'Servings: 8'. The overall color scheme is dominated by creamy white and light pink, with a subtle paper texture in the background, featuring compact and orderly text and image layout with clear information hierarchy."
image = pipe(
prompt=prompt,
height=32 * 32,
width=36 * 32,
num_inference_steps=30,
guidance_scale=1.5,
generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]

image.save("output_t2i.png")
```

### Image to Image Generation

```python
import torch
from diffusers.pipelines.glm_image import GlmImagePipeline
from PIL import Image

pipe = GlmImagePipeline.from_pretrained("zai-org/GLM-Image",torch_dtype=torch.bfloat16,device_map="cuda")
image_path = "cond.jpg"
prompt = "Replace the background of the snow forest with an underground station featuring an automatic escalator."
image = Image.open(image_path).convert("RGB")
image = pipe(
prompt=prompt,
image=[image], # can input multiple images for multi-image-to-image generation such as [image, image1]
height=33 * 32,
width=32 * 32,
num_inference_steps=30,
guidance_scale=1.5,
generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]

image.save("output_i2i.png")
```

+ Since the AR model used in GLM-Image is configured with `do_sample=True` and a temperature of `0.95` by default, the generated images can vary significantly across runs. We do not recommend setting do_sample=False, as this may lead to incorrect or degenerate outputs from the AR model.

## GlmImagePipeline

[[autodoc]] GlmImagePipeline
- all
- __call__

## GlmImagePipelineOutput

[[autodoc]] pipelines.cogview4.pipeline_output.GlmImagePipelineOutput
4 changes: 4 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@
"FluxControlNetModel",
"FluxMultiControlNetModel",
"FluxTransformer2DModel",
"GlmImageTransformer2DModel",
"HiDreamImageTransformer2DModel",
"HunyuanDiT2DControlNetModel",
"HunyuanDiT2DModel",
Expand Down Expand Up @@ -490,6 +491,7 @@
"FluxKontextPipeline",
"FluxPipeline",
"FluxPriorReduxPipeline",
"GlmImagePipeline",
"HiDreamImagePipeline",
"HunyuanDiTControlNetPipeline",
"HunyuanDiTPAGPipeline",
Expand Down Expand Up @@ -977,6 +979,7 @@
FluxControlNetModel,
FluxMultiControlNetModel,
FluxTransformer2DModel,
GlmImageTransformer2DModel,
HiDreamImageTransformer2DModel,
HunyuanDiT2DControlNetModel,
HunyuanDiT2DModel,
Expand Down Expand Up @@ -1212,6 +1215,7 @@
FluxKontextPipeline,
FluxPipeline,
FluxPriorReduxPipeline,
GlmImagePipeline,
HiDreamImagePipeline,
HunyuanDiTControlNetPipeline,
HunyuanDiTPAGPipeline,
Expand Down
2 changes: 2 additions & 0 deletions src/diffusers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@
_import_structure["transformers.transformer_easyanimate"] = ["EasyAnimateTransformer3DModel"]
_import_structure["transformers.transformer_flux"] = ["FluxTransformer2DModel"]
_import_structure["transformers.transformer_flux2"] = ["Flux2Transformer2DModel"]
_import_structure["transformers.transformer_glm_image"] = ["GlmImageTransformer2DModel"]
_import_structure["transformers.transformer_hidream_image"] = ["HiDreamImageTransformer2DModel"]
_import_structure["transformers.transformer_hunyuan_video"] = ["HunyuanVideoTransformer3DModel"]
_import_structure["transformers.transformer_hunyuan_video15"] = ["HunyuanVideo15Transformer3DModel"]
Expand Down Expand Up @@ -208,6 +209,7 @@
EasyAnimateTransformer3DModel,
Flux2Transformer2DModel,
FluxTransformer2DModel,
GlmImageTransformer2DModel,
HiDreamImageTransformer2DModel,
HunyuanDiT2DModel,
HunyuanImageTransformer2DModel,
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/models/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
from .transformer_easyanimate import EasyAnimateTransformer3DModel
from .transformer_flux import FluxTransformer2DModel
from .transformer_flux2 import Flux2Transformer2DModel
from .transformer_glm_image import GlmImageTransformer2DModel
from .transformer_hidream_image import HiDreamImageTransformer2DModel
from .transformer_hunyuan_video import HunyuanVideoTransformer3DModel
from .transformer_hunyuan_video15 import HunyuanVideo15Transformer3DModel
Expand Down
Loading