This repository contains the official implementation of iFSQ and LlamaGen-REPA.
The key insight is replacing the
- 🪐 Methodology: We propose iFSQ, a distribution-aware improvement to FSQ. We resolve the conflict between information efficiency and reconstruction fidelity.
- ⚡️ Benchmarking: We use iFSQ as a unified tokenizer to benchmark AR against diffusion models.
- 💥 Insights:
- The optimal equilibrium between discrete and continuous representations lies at approximately 4 bits per dimension.
- AR models exhibit rapid initial convergence, whereas Diffusion models achieve a superior performance ceiling.
- 🛸 Extension: We introduce LlamaGen-REPA, adapting Representation Alignment to AR models to enhance semantic alignment.
First, download and set up the repo:
git clone https://github.com/Tencent-Hunyuan/iFSQ.git
cd iFSQWe provide an requirements.txt file that can be used to create the environment.
conda create -n ifsq python=3.10 -y
conda activate ifsq
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txtcd ifsq
bash configs/ifsq_f16_d4_4bit/run.shWe record validation metrics during training. We also provide scripts for standalone validation.
cd ifsq
torchrun --nproc_per_node=8 \
eval_ddp.py \
--imgnet_eval_path ${IMAGENETT_VAL} \
--coco_eval_path ${COCO2017_VAL} \
--model_name ImageFSQVAE \
--ckpt_path results/ifsq_f16_d4_4bit/checkpoint-10000.ckpt \
--model_config configs/ifsq_f16_d4_4bit/run.json \
--resolution 256 \
--dataset_num_worker 8 \
--eval_batch_size 64 \
--eval_lpips \
--eval_psnr \
--eval_ssim \
--eval_fid \
--emaUsing the iFSQ trained in the previous stage.
cd llamagen
torchrun --nproc_per_node=8 \
train.py --config configs/fsq17x4_large_repa8_0p5/config.yamliFSQ can also use multi-codebook, where each token is represented by multiple indices. For example, each token uses 2 indices.
cd llamagen
torchrun --nproc_per_node=8 \
train.py --config configs/fsq17x4_ds16_large_repa_d8_2p0_f2x2/config.yamlIf you want to use VQ-VAE in the original LlamaGen.
cd llamagen
torchrun --nproc_per_node=8 \
train.py --config configs/large_repa_d8_2p0/config.yamlGenerate 50k images and validate using torch_fid, while producing .npz files.
cd llamagen
torchrun --nproc_per_node=8 \
inference.py --config configs/large_repa_d8_2p0/config.yamlAlternatively, we also provide tools for validation following ADM.
# we recommend cuda12.2
conda create -n adm_eval python=3.10 -y
conda activate adm_eval
pip install tensorflow==2.15.0 scipy requests tqdm numpy==1.23.5
pip install nvidia-pyindex
pip install nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
python tools/evaluator.py \
VIRTUAL_imagenet256_labeled.npz \
/path/to/.npzcd dit
accelerate launch --num_processes 8 \
train.py --config configs/fsq17x4_large_repa8_0p5/run.yamlGenerate 50k images and validate using torch_fid, while producing .npz files.
cd dit
accelerate launch --num_processes 8 \
inference.py --config configs/fsq17x4_large_repa8_0p5/run.yamlAlternatively, we also provide tools for validation following ADM.
conda create -n adm_eval python=3.10 -y
conda activate adm_eval
# cuda12.2
pip install tensorflow==2.15.0 scipy requests tqdm numpy==1.23.5
pip install nvidia-pyindex
pip install nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
python tools/evaluator.py \
VIRTUAL_imagenet256_labeled.npz \
/path/to/.npzThis project builds upon the excellent work of the following repositories:
- WF-VAE: used as the main template for building our training codebase.
- LightningDiT: referenced for its well-organized configuration file structure.
- LlamaGen: referenced for the original model design and evaluation pipeline.
- DiT: referenced for the original model implementation and evaluation setup.
If you find this work useful for your research, please consider citing our paper:
@misc{lin2026ifsqimprovingfsqimage,
title={iFSQ: Improving FSQ for Image Generation with 1 Line of Code},
author={Bin Lin and Zongjian Li and Yuwei Niu and Kaixiong Gong and Yunyang Ge and Yunlong Lin and Mingzhe Zheng and JianWei Zhang and Miles Yang and Zhao Zhong and Liefeng Bo and Li Yuan},
year={2026},
eprint={2601.17124},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.17124},
}The majority of this project is licensed under Apache 2.0 License, detailed in LICENSE.txt.
