1. a dropout between two FC in FFN 2. In the embedding layers, you should multiply those weights by sqrt(d_model). 