GGUF export broken for Qwen3.5-9B merged checkpoint (text-only fine-tune)

GGUF export broken for Qwen3.5-9B merged checkpoint (text-only fine-tune) — workaround inside

Posting this in case anyone hits the same issue and to flag it for the Unsloth team.

Setup:
Unsloth Studio 2026.3.8, transformers 5.3.0, Torch 2.10.0+cu130, Windows, RTX PRO 6000 Blackwell

What happens:
After fine-tuning Qwen3.5-9B on a text-only dataset, trying to export to GGUF (either from the Studio UI or manually via `save_pretrained_gguf()`) always fails with:

`RuntimeError: config.json does not exist inside exports/Qwen3.5-9B-finetune-gguf/model`

Why it happens:
Qwen3.5 has the architecture `Qwen3_5ForConditionalGeneration`, so Unsloth auto-detects it as a vision model even when the fine-tune was text-only. The VLM merge path is buggy and doesn't write any files to disk.

Even forcing `FastLanguageModel` directly doesn't help — because the checkpoint is already merged (no LoRA adapters), unsloth just prints a warning and skips the merge step entirely, leaving the export folder empty.

Workaround:
Save the model manually in HF format first, then convert with llama.cpp directly:

    model.save_pretrained(hf_dir)
    tokenizer.save_pretrained(hf_dir)
    # then run convert_hf_to_gguf.py manually

What should happen:
`save_pretrained_gguf()` should detect that the model is already merged and skip straight to writing the HF files, instead of silently doing nothing and then crashing when llama.cpp can't find `config.json`.

Hope this helps someone. 

Workaround:
-----------

from unsloth import FastLanguageModel
import torch, subprocess, sys, os

checkpoint = r'C:\Users\EA\.unsloth\studio\outputs\unsloth_Qwen3.5-9B_1773922217'
hf_dir     = r'C:\Users\EA\.unsloth\studio\exports\Qwen3.5-9B-finetune-hf'
gguf_out   = r'C:\Users\EA\.unsloth\studio\exports\Qwen3.5-9B-finetune-gguf'

# Step 1: save in standard HF format
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name   = checkpoint,
    load_in_4bit = False,
    dtype        = torch.bfloat16,
)

print("Saving in HF format...")
model.save_pretrained(hf_dir)
tokenizer.save_pretrained(hf_dir)
print(f"Saved to: {hf_dir}")

# Step 2: convert with llama.cpp
os.makedirs(gguf_out, exist_ok=True)
gguf_file = os.path.join(gguf_out, "model-q8_0.gguf")

# Find convert_hf_to_gguf.py from llama.cpp
import glob
convert_scripts = glob.glob(r'C:\Users\EA\.unsloth\**\convert_hf_to_gguf.py', recursive=True)
if not convert_scripts:
    print("ERROR: convert_hf_to_gguf.py not found!")
    sys.exit(1)

convert_script = convert_scripts[0]
print(f"Using: {convert_script}")

# First convert to bf16
bf16_gguf = os.path.join(gguf_out, "model-bf16.gguf")
subprocess.run([
    sys.executable, convert_script,
    hf_dir,
    "--outfile", bf16_gguf,
    "--outtype", "bf16"
], check=True)

# Then quantize to q8_0 with llama-quantize
quantize_bins = glob.glob(r'C:\Users\EA\.unsloth\**\llama-quantize.exe', recursive=True)
if quantize_bins:
    subprocess.run([quantize_bins[0], bf16_gguf, gguf_file, "Q8_0"], check=True)
    os.remove(bf16_gguf)
    print(f"\nExport completed: {gguf_file}")
else:
    print(f"\nBF16 GGUF ready (llama-quantize not found): {bf16_gguf}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF export broken for Qwen3.5-9B merged checkpoint (text-only fine-tune) #8

Workaround:

Step 1: save in standard HF format

Step 2: convert with llama.cpp

Find convert_hf_to_gguf.py from llama.cpp

First convert to bf16

Then quantize to q8_0 with llama-quantize

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GGUF export broken for Qwen3.5-9B merged checkpoint (text-only fine-tune) #8

Description

Workaround:

Step 1: save in standard HF format

Step 2: convert with llama.cpp

Find convert_hf_to_gguf.py from llama.cpp

First convert to bf16

Then quantize to q8_0 with llama-quantize

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions