Have the training been implemented (or attempted) on **ratio 8** tokenizer?

The rFID of **ratio 8** llamagen tokenizer looks good, I'm wondering if the author or anyone else has tried it to train a generation model with 1024 tokens. 
Auto-regressive or discrete diffusion,anyway, cuz this seems to have an opportunity to surpass continuous diffusion models.