Skip to content

GGUF export broken for Qwen3.5-9B merged checkpoint (text-only fine-tune) #8

@Fringe210

Description

@Fringe210

GGUF export broken for Qwen3.5-9B merged checkpoint (text-only fine-tune) — workaround inside

Posting this in case anyone hits the same issue and to flag it for the Unsloth team.

Setup:
Unsloth Studio 2026.3.8, transformers 5.3.0, Torch 2.10.0+cu130, Windows, RTX PRO 6000 Blackwell

What happens:
After fine-tuning Qwen3.5-9B on a text-only dataset, trying to export to GGUF (either from the Studio UI or manually via save_pretrained_gguf()) always fails with:

RuntimeError: config.json does not exist inside exports/Qwen3.5-9B-finetune-gguf/model

Why it happens:
Qwen3.5 has the architecture Qwen3_5ForConditionalGeneration, so Unsloth auto-detects it as a vision model even when the fine-tune was text-only. The VLM merge path is buggy and doesn't write any files to disk.

Even forcing FastLanguageModel directly doesn't help — because the checkpoint is already merged (no LoRA adapters), unsloth just prints a warning and skips the merge step entirely, leaving the export folder empty.

Workaround:
Save the model manually in HF format first, then convert with llama.cpp directly:

model.save_pretrained(hf_dir)
tokenizer.save_pretrained(hf_dir)
# then run convert_hf_to_gguf.py manually

What should happen:
save_pretrained_gguf() should detect that the model is already merged and skip straight to writing the HF files, instead of silently doing nothing and then crashing when llama.cpp can't find config.json.

Hope this helps someone.

Workaround:

from unsloth import FastLanguageModel
import torch, subprocess, sys, os

checkpoint = r'C:\Users\EA.unsloth\studio\outputs\unsloth_Qwen3.5-9B_1773922217'
hf_dir = r'C:\Users\EA.unsloth\studio\exports\Qwen3.5-9B-finetune-hf'
gguf_out = r'C:\Users\EA.unsloth\studio\exports\Qwen3.5-9B-finetune-gguf'

Step 1: save in standard HF format

model, tokenizer = FastLanguageModel.from_pretrained(
model_name = checkpoint,
load_in_4bit = False,
dtype = torch.bfloat16,
)

print("Saving in HF format...")
model.save_pretrained(hf_dir)
tokenizer.save_pretrained(hf_dir)
print(f"Saved to: {hf_dir}")

Step 2: convert with llama.cpp

os.makedirs(gguf_out, exist_ok=True)
gguf_file = os.path.join(gguf_out, "model-q8_0.gguf")

Find convert_hf_to_gguf.py from llama.cpp

import glob
convert_scripts = glob.glob(r'C:\Users\EA.unsloth**\convert_hf_to_gguf.py', recursive=True)
if not convert_scripts:
print("ERROR: convert_hf_to_gguf.py not found!")
sys.exit(1)

convert_script = convert_scripts[0]
print(f"Using: {convert_script}")

First convert to bf16

bf16_gguf = os.path.join(gguf_out, "model-bf16.gguf")
subprocess.run([
sys.executable, convert_script,
hf_dir,
"--outfile", bf16_gguf,
"--outtype", "bf16"
], check=True)

Then quantize to q8_0 with llama-quantize

quantize_bins = glob.glob(r'C:\Users\EA.unsloth**\llama-quantize.exe', recursive=True)
if quantize_bins:
subprocess.run([quantize_bins[0], bf16_gguf, gguf_file, "Q8_0"], check=True)
os.remove(bf16_gguf)
print(f"\nExport completed: {gguf_file}")
else:
print(f"\nBF16 GGUF ready (llama-quantize not found): {bf16_gguf}")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions