You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes for the case of soft attention: somewhat mixed result across tasks.
Active memory operates on all of the memory in parallel in a uniform way, bringing improvement in the algorithmic task, image processing, and generative modelings.
Does active memory perform well in machine translation? [YES]
Details
Attention
Only a small part of the memory changes at every step, or the memory remains constant.
Important limitation in attention mechanism is that it can only focus on a single element of the memory due to its nature of softmax.
Active Memory
Any model where every part of the memory undergoes an active change at every step.
NMT with Neural GPU
parallel encoding and decoding
BLEU < 5
conditional dependence between outputs are not considered
NMT with Markovian Neural GPU
parallel encoding and 1-step conditioned decoding
BLEU < 5
Perhaps, Markovian dependence of the outputs is too weak for this problem - a full recurrent dependence of the state is needed for good performance
NMT with Extended Neural GPU
parallel encoding and sequential decoding
BLEU = 29.6 (WMT 14 En-Fr)
active memory decoder (d) holds a recurrent state of decoding and output tape tensor (p) holds past decoded logits, going through CGRU^d.
CGRU
the convolutional operation followed by recurrent operation
stack of CGRU expands receptive field of conv operation
output tape tensor acts as external memory of decoded logits
Personal Thoughts
Same architecture, but encoder and decoder hidden states may be doing different things
encoder: embed semantic locally
decoder : track how much it has decoded, use tape tensor to hold information of what it has decoded
Will it work for languages with different sentence order?
What part of the translation problem can we treat as convolutional?
Is "Transformer" a combination of attention and active memory?
Abstract
Details
Attention
Active Memory
NMT with Neural GPU
NMT with Markovian Neural GPU
NMT with Extended Neural GPU
CGRU
Personal Thoughts
Link: https://arxiv.org/pdf/1610.08613.pdf
Authors: Lukas Kaiser (Google Brain) et al. 2017
The text was updated successfully, but these errors were encountered: