-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Description
Bug Report for https://neetcode.io/problems/multi-headed-self-attention
Please describe the bug below and include any steps to reproduce the bug or screenshots if possible.
Your multihead attention implementation is wrong (you should first calculate complete q, k, v from embedding using w_k, w_q, w_v, and then divide them into the heads and calculate the attention)
In your solution, embeddings are divided into heads, and then weights are multiplied, which is not the standard transformer implementation.
Metadata
Metadata
Assignees
Labels
No labels