The rFID of ratio 8 llamagen tokenizer looks good, I'm wondering if the author or anyone else has tried it to train a generation model with 1024 tokens.
Auto-regressive or discrete diffusion,anyway, cuz this seems to have an opportunity to surpass continuous diffusion models.