[Bug] Commit 4ff2c8c: Immediate crash after CUDA initialization when using Z-Image/Flux models (RTX 4070) #1162
Replies: 4 comments 10 replies
-
|
There are several releases between master-431-23fce0b and master-453-4ff2c8c . Could you pinpoint which one is the first release that crashes for you? (for instance, does master-442-3e6c428 work?) Also, is this really specific for Z-Image (or Flux)? Does any other model (e.g. SD1.5, SDXL) work? |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for the follow-up. I've conducted more comprehensive testing to pinpoint the issue. 1. Version Testing & Problem ScopeRegarding your question about
All of these versions crash at the exact same point as 2. Model Specificity TestingThe issue is NOT specific to Z-Image/Flux models. I tested with an AnythingXL_xl.safetensors model and encountered the identical crash. Here's the complete terminal output: PS D:\stable-diffusion.cpp> .\bin\sd-cli.exe -m ./AnythingXL_xl.safetensors -p "可爱, 萌系风格, 白发绿眼少女, 动漫画风, 卡通形象" -o test.png --cfg-scale 1.0 -v --diffusion-fa -H 1024 -W 512
[DEBUG] main.cpp:500 - version: stable-diffusion.cpp version unknown, commit 4ff2c8c
[DEBUG] main.cpp:501 - System Info:
SSE3 = 1 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 |
[DEBUG] main.cpp:502 - SDCliParams {
mode: img_gen,
output_path: "test.png",
verbose: true,
color: false,
canny_preprocess: false,
convert_name: false,
preview_method: none,
preview_interval: 1,
preview_path: "preview.png",
preview_fps: 16,
taesd_preview: false,
preview_noisy: false
}
[DEBUG] main.cpp:503 - SDContextParams {
n_threads: 8,
model_path: "./AnythingXL_xl.safetensors",
clip_l_path: "",
clip_g_path: "",
clip_vision_path: "",
t5xxl_path: "",
llm_path: "",
llm_vision_path: "",
diffusion_model_path: "",
high_noise_diffusion_model_path: "",
vae_path: "",
taesd_path: "",
esrgan_path: "",
control_net_path: "",
embedding_dir: "",
embeddings: {
}
wtype: NONE,
tensor_type_rules: "",
lora_model_dir: "",
photo_maker_path: "",
rng_type: cuda,
sampler_rng_type: NONE,
flow_shift: INF
offload_params_to_cpu: false,
enable_mmap: false,
control_net_cpu: false,
clip_on_cpu: false,
vae_on_cpu: false,
diffusion_flash_attn: true,
diffusion_conv_direct: false,
vae_conv_direct: false,
circular: false,
circular_x: false,
circular_y: false,
chroma_use_dit_mask: true,
qwen_image_zero_cond_t: false,
chroma_use_t5_mask: false,
chroma_t5_mask_pad: 1,
prediction: NONE,
lora_apply_mode: auto,
vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:504 - SDGenerationParams {
loras: "{
}",
high_noise_loras: "{
}",
prompt: "可爱, 萌系风格, 白发绿眼少女, 动漫画风, 卡通形象",
negative_prompt: "",
clip_skip: -1,
width: 512,
batch_count: 1,
init_image_path: "",
end_image_path: "",
mask_image_path: "",
control_image_path: "",
ref_image_paths: [],
control_video_path: "",
increase_ref_index: false,
pm_id_images_dir: "",
pm_id_embed_path: "",
pm_style_strength: 20,
skip_layers: [7, 8, 9],
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
high_noise_skip_layers: [7, 8, 9],
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
custom_sigmas: [],
cache_mode: "",
cache_option: "",
cache: disabled (threshold=1, start=0.15, end=0.95),
moe_boundary: 0.875,
video_frames: 1,
fps: 16,
vace_strength: 1,
strength: 0.75,
control_strength: 0.9,
seed: 42,
upscale_repeats: 1,
upscale_tile_size: 128,
}
[DEBUG] stable-diffusion.cpp:161 - Using CUDA backend
[INFO ] ggml_extend.hpp:78 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:78 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:78 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:78 - Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
<< CRASH OCCURS HERE - NO FURTHER OUTPUT >>Note that the program never reaches the model loading phase (no 3. Analysis & QuestionsThe consistent crash point (after successful CUDA initialization but before any model loading) suggests this is a system-level regression affecting RTX 4070 (compute capability 8.9) users across all model types. Key questions:
Regarding model format changes: Since the crash occurs before any model loading logic begins, this seems unlikely to be a model format issue. This appears to be a critical regression preventing RTX 4070 users from using any version after |
Beta Was this translation helpful? Give feedback.
-
|
I also have the same problem of sd.cpp crashing after GPU/Compute Device detection and before loading any models (flux dev q6 in my case). I have a RTX 4090 and AMD 7950x3d running and i even found that the same behaviour happens with vulkan and avx512 builds. I tested and it stopped working with 442-3e6c428 I tested the following releases:
No errors it always crashes before the model loading step. To make sure there is no problem with the paths i tried, both absolut und relative paths for the models in 453-4ff2c8c, but they both only work in 440-3e81246 and below. |
Beta Was this translation helpful? Give feedback.
-
|
Can you try the binaries built from https://github.com/CarlGao4/stable-diffusion.cpp/actions/runs/20702569376 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Bug Report: Immediate Crash with Z-Image Models on New Version (Commit 4ff2c8c)
Summary
The latest version of sd-cli.exe (commit 4ff2c8c) crashes immediately after detecting the CUDA device when using Z-Image (Flux) models, while the older version (commit 23fce0b) works perfectly with the same models and command.
Environment
OS: Windows 10/11
GPU: NVIDIA GeForce RTX 4070 (Compute Capability 8.9)
New Version: stable-diffusion.cpp version unknown, commit 4ff2c8c
Working Version: stable-diffusion.cpp version unknown, commit 23fce0b
Backend: CUDA
Model: z_image_turbo-Q4_K.gguf (Z-Image/Flux model)
VAE: diffusion_pytorch_model.safetensors
LLM: Qwen3-4B-Instruct-2507-Q4_K_M.gguf
Reproduction Steps
Command to Reproduce
powershell
.\bin\sd-cli.exe --diffusion-model .\models\z_image_turbo-Q4_K.gguf --vae .\models\diffusion_pytorch_model.safetensors --llm .\models\Qwen3-4B-Instruct-2507-Q4_K_M.gguf -p "可爱, 萌系风格, 白发绿眼少女, 动漫画风, 卡通形象" -o test.png --cfg-scale 1.0 -H 1024 -W 512 -v
Expected Result
The program should successfully load the model and generate the image as seen in the older version.
Actual Result
The program crashes immediately after detecting the CUDA device, without attempting to load model weights.
Logs
Failing Log (Commit 4ff2c8c)
[DEBUG] main.cpp:500 - version: stable-diffusion.cpp version unknown, commit 4ff2c8c
[DEBUG] main.cpp:501 - System Info:
SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 ...
[DEBUG] main.cpp:502 - SDCliParams {
mode: img_gen,
...
}
[DEBUG] main.cpp:503 - SDContextParams {
...
diffusion_flash_attn: false,
...
prediction: NONE,
...
}
[DEBUG] main.cpp:504 - SDGenerationParams {
...
strength: 0.75,
...
}
[DEBUG] stable-diffusion.cpp:161 - Using CUDA backend
[INFO ] ggml_extend.hpp:78 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:78 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:78 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:78 - Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
<< CRASH HERE >>
Working Log (Commit 23fce0b)
[DEBUG] main.cpp:379 - version: stable-diffusion.cpp version unknown, commit 23fce0b
...
[INFO ] ggml_extend.hpp:77 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:77 - Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:233 - loading diffusion model from '.\models\z_image_turbo-Q4_K.gguf'
[INFO ] model.cpp:370 - load .\models\z_image_turbo-Q4_K.gguf using gguf format
...
[INFO ] model.cpp:1585 - loading tensors completed, taking 2.69s ...
[INFO ] main.cpp:741 - save result PNG image to 'test.png' (success)
Troubleshooting Attempted
Removed --diffusion-fa (Flash Attention)
Modified --strength (set to 1.0)
Explicitly set --prediction flux_flow and --flow-shift 1.0
Added -v for verbose logging
Tried --offload-to-cpu and other memory-related parameters
None of these changes resolved the issue. The crash consistently occurs at the same point after CUDA device detection.
Analysis
The crash point suggests a regression in the CUDA initialization logic between commits 23fce0b (working) and 4ff2c8c (failing). Specifically, it happens after CUDA context initialization but before the model loading phase begins.
Key differences in context parameters:
New version: diffusion_flash_attn: false
Old version: diffusion_flash_attn: true
This appears to be a compatibility issue with RTX 4070 (compute capability 8.9) in the latest version. The crash occurs at the exact same point in the code, indicating a problem in the CUDA initialization sequence rather than model loading.
Additional Notes
The exact same models and command work perfectly on the older version
The crash is deterministic and occurs with any Z-Image model
No error messages are printed before the crash
The issue is specific to the latest commit (4ff2c8c)
This is a critical regression that prevents using the latest stable-diffusion.cpp with Z-Image models on RTX 4070 GPUs. I'd appreciate any insights into what changed between these commits that might affect RTX 4070 compatibility or CUDA initialization.
This bug report is ready for GitHub Discussions. It's clear, concise, provides all necessary technical details, and follows best practices for bug reporting. The key issue (crash after CUDA detection) is clearly identified, and the comparison between working and failing versions helps pinpoint the regression.
Beta Was this translation helpful? Give feedback.
All reactions