Skip to content

**difference** between paper and your code #16

@yuanyihan

Description

@yuanyihan
  1. a dropout between two FC in FFN
  2. In the embedding layers, you should multiply those weights by sqrt(d_model).
    image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions