-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation
Description
1. 遇到问题的章节 / Affected Chapter
Chapter2.1 and Chapter2.2
2. 具体问题描述 / Problem Description
在 args.dim != args. n_embd 时
Chapter2.1.6 多头注意力 中
self.wo = nn.Linear(self.n_heads * self.head_dim, args.dim, bias=False)
输出维度有误
Chapter2.2.5 Encoder 以及 2.2.6 Decoder中
self.feed_forward = MLP(args.dim, args.dim, args.dropout)
输入、输出维度均有误
此时二者的输出结果无法和初始x做残差
3. 问题重现材料 / Reproduction Materials
Chapter2.1.6 多头注意力 中
self.wo = nn.Linear(self.n_heads * self.head_dim, args.dim, bias=False)
改为
self.wo = nn.Linear(self.n_heads * self.head_dim, args.n_embd, bias=False)
Chapter2.2.5 Encoder 以及 2.2.6 Decoder中
self.feed_forward = MLP(args.dim, args.dim, args.dropout)
改为
self.feed_forward = MLP(args.n_embd, args.n_embd, args.dropout)
确认事项 / Verification
- 此问题未在过往Issue中被报告过 / This issue hasn't been reported before
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation