Why head_size must be 576 while q_head_dim of DeepSeek-V3 is only 192? #49

WangNorthSea · 2025-02-27T11:21:59Z

head_size here comes from q.sizes()[3]
But in 'modeling_deepseek.py' of DeepSeek-V3 model,
q = q.view(bsz, q_len, self.num_heads, self.q_head_dim).transpose(1, 2)
Here self.q_head_dim = config.qk_nope_head_dim + config.qk_rope_head_dim which is 128+64=192 according to 'config.json'.
How to understand this correctly?

The text was updated successfully, but these errors were encountered:

Kevinstone-199898 · 2025-02-28T14:40:21Z

The head size is qk_rope_head_dim+kv_lora_rank = 64+512 = 576, which is same in deepseek v2, v3 and r1. This is the absorbed version instead of the ordinary huggingface version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why head_size must be 576 while q_head_dim of DeepSeek-V3 is only 192? #49

Why head_size must be 576 while q_head_dim of DeepSeek-V3 is only 192? #49

WangNorthSea commented Feb 27, 2025

Kevinstone-199898 commented Feb 28, 2025

Why head_size must be 576 while q_head_dim of DeepSeek-V3 is only 192? #49

Why head_size must be 576 while q_head_dim of DeepSeek-V3 is only 192? #49

Comments

WangNorthSea commented Feb 27, 2025

Kevinstone-199898 commented Feb 28, 2025