Skip to content

Commit

Permalink
Use mixed precision for gelu intermediate activation in BERT SQuAD model
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 303407939
  • Loading branch information
tensorflower-gardener committed Mar 27, 2020
1 parent da5860f commit 8849285
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions official/nlp/modeling/layers/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,8 @@ def build(self, input_shape):
kernel_constraint=self._kernel_constraint,
bias_constraint=self._bias_constraint,
name="intermediate")
# Use float32 in intermediate gelu activation for numeric stability.
# TODO(b/149117297): investigate gelu numeric stability.
self._intermediate_activation_layer = tf.keras.layers.Activation(
self._intermediate_activation, dtype=tf.float32)
self._intermediate_activation)
self._output_dense = dense_einsum.DenseEinsum(
output_shape=hidden_size,
kernel_initializer=self._kernel_initializer,
Expand Down

0 comments on commit 8849285

Please sign in to comment.