Skip to content

Problem using attention wrapper. #9

@PratsBhatt

Description

@PratsBhatt

am getting issue related to miss match of state and output. But I am unable to figure the issue.
It would be really appreciated if someone can guide me. Thanks in advance.
I am using tensorfow-gpu==1.2.1, with 1080 Ti graphics.

Error is as below:
ValueError: Shapes (8, 522) and (8, 512) are incompatible

Error occurs in the file "attention_wrapper.py" in the method named "call" at line 708

cell_output, next_cell_state = self._cell(cell_inputs, cell_state)

I was able to figure out that it is adding the attention_size to the shape and so there is a mismatch.
But I have no idea how to fix it.
The code is as below, hyper-parameters are declared as below (test purpose).
`
batch_size= 8
number_of_units_per_layer= 512
number_of_layers = 3
attn_size= 10
def build_decoder_cell(enc_output, enc_state, source_sequence_length, attn_size, batch_size):

encoder_outputs = enc_output
encoder_last_state = enc_state
encoder_inputs_length = source_sequence_length

attention_mechanism = attention_wrapper.LuongAttention(
        num_units=attn_size, memory=encoder_outputs,
        memory_sequence_length=encoder_inputs_length,
        scale=True,
        name='LuongAttention' )

# Building decoder_cell
decoder_cell_list = [
    build_single_cell() for i in range(num_layers)]

decoder_initial_state = encoder_last_state

def attn_decoder_input_fn(inputs, attention):
    #if not self.attn_input_feeding:
    #    return inputs

    # Essential when use_residual=True
    _input_layer = Dense(size, dtype=tf.float32,
                        name='attn_input_feeding')
    return _input_layer(array_ops.concat([inputs, attention], -1))


# AttentionWrapper wraps RNNCell with the attention_mechanism
# Note: We implement Attention mechanism only on the top decoder layer
decoder_cell_list[-1] = attention_wrapper.AttentionWrapper(
    cell=decoder_cell_list[-1],
    attention_mechanism=attention_mechanism,
    attention_layer_size=attn_size,
    #cell_input_fn=attn_decoder_input_fn,
    initial_cell_state=encoder_last_state[-1],
    alignment_history=False,
    name='Attention_Wrapper')

# To be compatible with AttentionWrapper, the encoder last state
# of the top layer should be converted into the AttentionWrapperState form
# We can easily do this by calling AttentionWrapper.zero_state

# Also if beamsearch decoding is used, the batch_size argument in .zero_state
# should be ${decoder_beam_width} times to the origianl batch_size
#batch_size = self.batch_size if not self.use_beamsearch_decode \
#    else self.batch_size * self.beam_width
initial_state = [state for state in encoder_last_state]

initial_state[-1] = decoder_cell_list[-1].zero_state(
    batch_size=batch_size, dtype=tf.float32)
decoder_initial_state = tuple(initial_state)

return tf.contrib.rnn.MultiRNNCell(decoder_cell_list), decoder_initial_state`

Thank you once again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions