A GGUF editor, which can be use to duplicate or delete layers in Qwen3.5 / Qwen Coder Next or whatever but may not run in 1 shot.#1746
Conversation
|
And...as a guy mostly use GitHub in mobile... Feel free to edit anything without asking. |
|
Perhaps Kimi can start the PR description by explaining to us the utility of duplicating and removing layers? |
Well, I like that idea, but Kimi not in my phone so... in case some model get stubborn due to overtraining, delete some earlier layers makes it more thoughtful; In case we think models reasoning is not strong enough, simple duplicate middle or later layer could help; Some evidence? As what I remember there's some test related to rys repetition? Which pointed out the duplicate beliefs; Or we maybe don't need that much, just a new toy to play with. 😁 |
Try this Modern LLM Hacking and hints of a Universal Language tl;dr repeating certain layers in Qwen-28B makes it smarter. with some analysis one can choose an optimal set of layers to repeat. with some help from the engine, one could repeat those layers without addl. VRAM area. |
|
Doubling the middle layers is actually a viable way of improving models, as papers and other stuff say.. But this is the wrong way to do it. Using more vram is not the way to go: turboderp-org/exllamav3#174 |
And the other day I saw a post claiming that TurboQuant was the best thing since sliced bread. Oh, wait, I actually saw many such posts. If we started doing everything that we saw on the Internet, the entire computing infrastructure of the universe wouldn't be enough to run the agents implementing what somebody posted somewhere. I'm familiar with people pruning models. We have more than enough of those, no need to add yet another source of them. At least a pruned model is smaller than the original, so it makes sense to have it stored on disk. I'm also familiar with people merging models, people repeating layers in models, etc. But it is one thing someone running calculations for weeks to find out which layers to duplicate, and quite another some user duplicating a bunch of randomly chosen layers. But if one wanted to seriously support the feature of duplicating layers, one would do it on the inference engine level, and not by having a Kimi generated script produce a larger model stored on disk that we need to load and keep in RAM/VRAM. |
|
Indeed I have tried that with Qwen-coder-next, after waste number of TB, I got a better result by deleting 4-7 layers, feel free to test https://huggingface.co/Jahaz/Qwen-Coder-NX-73B |
|
Btw the very beginning idea to upload that script is for future possible mtp add; which obviously not included by that script, but the padding, row size situation, offset logic, could be simply reused. So that's the first point for me to keep it a draft; Another point is, let more people can play with that, finetune free, no origin safetensor needed; Third point is, keep it in an easy way, people will produce stronger model, and I might able to use it without find the best combo by myself. Fourth point is, let's make something fun haha. But final point is, comparing to let it die in my disk, like many other things, random deleted by no reason, upload here at least more people can play with. |
Woo, after a moment I actually think that really a good idea, if we can let model self decide to activate duplication / ignore layers during inference!!!! |
😅 I'm just giving some context , not defending random internet ideas |
|
I thought someone did benchmarks and the scores raised. It's not a new idea for sure and finding which layers to replay sounds like the challenge. |
|
It's only a single script draft guys. just have fun to play it😂 |
|
Yea I know but it would be a really neat experiment to do it virtually. Brute forcing weights on disk is painful. |
I agree, and I also experiment layer |
Currently I think it's working for Q8_0, Q6_0, IQ4KS, IQ4KSS, f16, f32, IQ2KS tensors, more Quant type may not support yet, but can be easily extended base on current code. (row size related)
AI usage : This script made with Kimi k2.6
below wrote by Kimi
Usage
Duplicate Layers
Copy layers 7, 8, 9, 10 and insert them after layer 10:
Multiple duplication ranges in one shot:
Delete Layers
Remove layers 4, 5, 6, 7:
Note: For hybrid architectures, deletions must preserve the recurrent/full-attention pattern. The script validates this and aborts with a clear error if the pattern would break. Generally, delete multiples of 4 consecutive layers.
Combined Operations
Delete some layers and duplicate others:
Verbose Mode
Add
--verbosefor detailed logging:Architecture Notes
These models alternate between two layer types:
attn_qkvandssm_*tensorsattn_q,attn_k,attn_vtensorsThe pattern is determined by
(layer_idx + 1) % full_attention_interval:== 0: full-attention layer!= 0: recurrent layerWhen duplicating or deleting, the script ensures layers end up at positions with matching types. Incompatible layers are automatically skipped with a warning.
basically I think we can base on that, after some edition, add mtp layer to existed GGUF file.