-
Notifications
You must be signed in to change notification settings - Fork 44
Description
GGUF export broken for Qwen3.5-9B merged checkpoint (text-only fine-tune) — workaround inside
Posting this in case anyone hits the same issue and to flag it for the Unsloth team.
Setup:
Unsloth Studio 2026.3.8, transformers 5.3.0, Torch 2.10.0+cu130, Windows, RTX PRO 6000 Blackwell
What happens:
After fine-tuning Qwen3.5-9B on a text-only dataset, trying to export to GGUF (either from the Studio UI or manually via save_pretrained_gguf()) always fails with:
RuntimeError: config.json does not exist inside exports/Qwen3.5-9B-finetune-gguf/model
Why it happens:
Qwen3.5 has the architecture Qwen3_5ForConditionalGeneration, so Unsloth auto-detects it as a vision model even when the fine-tune was text-only. The VLM merge path is buggy and doesn't write any files to disk.
Even forcing FastLanguageModel directly doesn't help — because the checkpoint is already merged (no LoRA adapters), unsloth just prints a warning and skips the merge step entirely, leaving the export folder empty.
Workaround:
Save the model manually in HF format first, then convert with llama.cpp directly:
model.save_pretrained(hf_dir)
tokenizer.save_pretrained(hf_dir)
# then run convert_hf_to_gguf.py manually
What should happen:
save_pretrained_gguf() should detect that the model is already merged and skip straight to writing the HF files, instead of silently doing nothing and then crashing when llama.cpp can't find config.json.
Hope this helps someone.
Workaround:
from unsloth import FastLanguageModel
import torch, subprocess, sys, os
checkpoint = r'C:\Users\EA.unsloth\studio\outputs\unsloth_Qwen3.5-9B_1773922217'
hf_dir = r'C:\Users\EA.unsloth\studio\exports\Qwen3.5-9B-finetune-hf'
gguf_out = r'C:\Users\EA.unsloth\studio\exports\Qwen3.5-9B-finetune-gguf'
Step 1: save in standard HF format
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = checkpoint,
load_in_4bit = False,
dtype = torch.bfloat16,
)
print("Saving in HF format...")
model.save_pretrained(hf_dir)
tokenizer.save_pretrained(hf_dir)
print(f"Saved to: {hf_dir}")
Step 2: convert with llama.cpp
os.makedirs(gguf_out, exist_ok=True)
gguf_file = os.path.join(gguf_out, "model-q8_0.gguf")
Find convert_hf_to_gguf.py from llama.cpp
import glob
convert_scripts = glob.glob(r'C:\Users\EA.unsloth**\convert_hf_to_gguf.py', recursive=True)
if not convert_scripts:
print("ERROR: convert_hf_to_gguf.py not found!")
sys.exit(1)
convert_script = convert_scripts[0]
print(f"Using: {convert_script}")
First convert to bf16
bf16_gguf = os.path.join(gguf_out, "model-bf16.gguf")
subprocess.run([
sys.executable, convert_script,
hf_dir,
"--outfile", bf16_gguf,
"--outtype", "bf16"
], check=True)
Then quantize to q8_0 with llama-quantize
quantize_bins = glob.glob(r'C:\Users\EA.unsloth**\llama-quantize.exe', recursive=True)
if quantize_bins:
subprocess.run([quantize_bins[0], bf16_gguf, gguf_file, "Q8_0"], check=True)
os.remove(bf16_gguf)
print(f"\nExport completed: {gguf_file}")
else:
print(f"\nBF16 GGUF ready (llama-quantize not found): {bf16_gguf}")