have you considered swapping the lora-style rank adapters for kronecker-factorized ones (similar to LoKr)? since the 512-wide heads reshape cleanly (to eg. 16×32 or 8×64), you can get a higher effective rank for the same param budget. it could possibly capture cross-dim structure better. just curious if you’ve benchmarked that or already ruled it out.