Problem using attention wrapper.

 am getting issue related to miss match of state and output. But I am unable to figure the issue.
It would be really appreciated if someone can guide me. Thanks in advance.
I am using tensorfow-gpu==1.2.1, with 1080 Ti graphics.

Error is as below:
ValueError: Shapes (8, 522) and (8, 512) are incompatible

Error occurs in the file "attention_wrapper.py" in the method named "call" at line 708

cell_output, next_cell_state = self._cell(cell_inputs, cell_state)

I was able to figure out that it is adding the attention_size to the shape and so there is a mismatch.
But I have no idea how to fix it.
The code is as below, hyper-parameters are declared as below (test purpose).
`
batch_size= 8
number_of_units_per_layer= 512
number_of_layers = 3
attn_size= 10
def build_decoder_cell(enc_output, enc_state, source_sequence_length, attn_size, batch_size):

    encoder_outputs = enc_output
    encoder_last_state = enc_state
    encoder_inputs_length = source_sequence_length

    attention_mechanism = attention_wrapper.LuongAttention(
            num_units=attn_size, memory=encoder_outputs,
            memory_sequence_length=encoder_inputs_length,
            scale=True,
            name='LuongAttention' )

    # Building decoder_cell
    decoder_cell_list = [
        build_single_cell() for i in range(num_layers)]

    decoder_initial_state = encoder_last_state

    def attn_decoder_input_fn(inputs, attention):
        #if not self.attn_input_feeding:
        #    return inputs

        # Essential when use_residual=True
        _input_layer = Dense(size, dtype=tf.float32,
                            name='attn_input_feeding')
        return _input_layer(array_ops.concat([inputs, attention], -1))


    # AttentionWrapper wraps RNNCell with the attention_mechanism
    # Note: We implement Attention mechanism only on the top decoder layer
    decoder_cell_list[-1] = attention_wrapper.AttentionWrapper(
        cell=decoder_cell_list[-1],
        attention_mechanism=attention_mechanism,
        attention_layer_size=attn_size,
        #cell_input_fn=attn_decoder_input_fn,
        initial_cell_state=encoder_last_state[-1],
        alignment_history=False,
        name='Attention_Wrapper')

    # To be compatible with AttentionWrapper, the encoder last state
    # of the top layer should be converted into the AttentionWrapperState form
    # We can easily do this by calling AttentionWrapper.zero_state

    # Also if beamsearch decoding is used, the batch_size argument in .zero_state
    # should be ${decoder_beam_width} times to the origianl batch_size
    #batch_size = self.batch_size if not self.use_beamsearch_decode \
    #    else self.batch_size * self.beam_width
    initial_state = [state for state in encoder_last_state]

    initial_state[-1] = decoder_cell_list[-1].zero_state(
        batch_size=batch_size, dtype=tf.float32)
    decoder_initial_state = tuple(initial_state)

    return tf.contrib.rnn.MultiRNNCell(decoder_cell_list), decoder_initial_state`

Thank you once again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem using attention wrapper. #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem using attention wrapper. #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions