Skip to content

[问题/Issue] 章节2.1:部分超参有误 / Chapter2.1: some parameters seem wrong #121

@ShinnJinn

Description

@ShinnJinn

1. 遇到问题的章节 / Affected Chapter

Chapter2.1 and Chapter2.2

2. 具体问题描述 / Problem Description

在 args.dim != args. n_embd 时
Chapter2.1.6 多头注意力 中
self.wo = nn.Linear(self.n_heads * self.head_dim, args.dim, bias=False)
输出维度有误

Chapter2.2.5 Encoder 以及 2.2.6 Decoder中
self.feed_forward = MLP(args.dim, args.dim, args.dropout)
输入、输出维度均有误

此时二者的输出结果无法和初始x做残差

3. 问题重现材料 / Reproduction Materials

Chapter2.1.6 多头注意力 中
self.wo = nn.Linear(self.n_heads * self.head_dim, args.dim, bias=False)
改为
self.wo = nn.Linear(self.n_heads * self.head_dim, args.n_embd, bias=False)

Chapter2.2.5 Encoder 以及 2.2.6 Decoder中
self.feed_forward = MLP(args.dim, args.dim, args.dropout)
改为
self.feed_forward = MLP(args.n_embd, args.n_embd, args.dropout)

确认事项 / Verification

  • 此问题未在过往Issue中被报告过 / This issue hasn't been reported before

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions