-
Notifications
You must be signed in to change notification settings - Fork 108
Description
am getting issue related to miss match of state and output. But I am unable to figure the issue.
It would be really appreciated if someone can guide me. Thanks in advance.
I am using tensorfow-gpu==1.2.1, with 1080 Ti graphics.
Error is as below:
ValueError: Shapes (8, 522) and (8, 512) are incompatible
Error occurs in the file "attention_wrapper.py" in the method named "call" at line 708
cell_output, next_cell_state = self._cell(cell_inputs, cell_state)
I was able to figure out that it is adding the attention_size to the shape and so there is a mismatch.
But I have no idea how to fix it.
The code is as below, hyper-parameters are declared as below (test purpose).
`
batch_size= 8
number_of_units_per_layer= 512
number_of_layers = 3
attn_size= 10
def build_decoder_cell(enc_output, enc_state, source_sequence_length, attn_size, batch_size):
encoder_outputs = enc_output
encoder_last_state = enc_state
encoder_inputs_length = source_sequence_length
attention_mechanism = attention_wrapper.LuongAttention(
num_units=attn_size, memory=encoder_outputs,
memory_sequence_length=encoder_inputs_length,
scale=True,
name='LuongAttention' )
# Building decoder_cell
decoder_cell_list = [
build_single_cell() for i in range(num_layers)]
decoder_initial_state = encoder_last_state
def attn_decoder_input_fn(inputs, attention):
#if not self.attn_input_feeding:
# return inputs
# Essential when use_residual=True
_input_layer = Dense(size, dtype=tf.float32,
name='attn_input_feeding')
return _input_layer(array_ops.concat([inputs, attention], -1))
# AttentionWrapper wraps RNNCell with the attention_mechanism
# Note: We implement Attention mechanism only on the top decoder layer
decoder_cell_list[-1] = attention_wrapper.AttentionWrapper(
cell=decoder_cell_list[-1],
attention_mechanism=attention_mechanism,
attention_layer_size=attn_size,
#cell_input_fn=attn_decoder_input_fn,
initial_cell_state=encoder_last_state[-1],
alignment_history=False,
name='Attention_Wrapper')
# To be compatible with AttentionWrapper, the encoder last state
# of the top layer should be converted into the AttentionWrapperState form
# We can easily do this by calling AttentionWrapper.zero_state
# Also if beamsearch decoding is used, the batch_size argument in .zero_state
# should be ${decoder_beam_width} times to the origianl batch_size
#batch_size = self.batch_size if not self.use_beamsearch_decode \
# else self.batch_size * self.beam_width
initial_state = [state for state in encoder_last_state]
initial_state[-1] = decoder_cell_list[-1].zero_state(
batch_size=batch_size, dtype=tf.float32)
decoder_initial_state = tuple(initial_state)
return tf.contrib.rnn.MultiRNNCell(decoder_cell_list), decoder_initial_state`
Thank you once again.