Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with shareable variables. with tf.variable_scope('model', reuse=tf.AUTO_REUSE) #20

Open
yanchvlad opened this issue Sep 3, 2018 · 4 comments

Comments

@yanchvlad
Copy link

yanchvlad commented Sep 3, 2018

In my network rollout of next epoch dosen't use trained weights of prev train operation. And I see in tensorboard that rollout and train graph have seperate 'model' and layers with different names (for ex. dense_1, dense_0, dense_2, dence_3).

Where is a problem?
I slightly changed code:

`

def build_graph(observations):

with tf.variable_scope('model', reuse=tf.AUTO_REUSE) as model:
    
    
    lstm=tf.keras.layers.LSTM(100, return_sequences=True, stateful=False, use_bias=True)(observations)
    lstm2=tf.keras.layers.LSTM(64, return_sequences=True, stateful=False, use_bias=True, dropout=0.2)(lstm)
    lstm3=tf.keras.layers.LSTM(64, return_sequences=True, stateful=False, use_bias=True)(lstm2)
    lstm7=tf.keras.layers.LSTM(32, stateful=False, use_bias=True, dropout=0.2)(lstm3)
    #hidden = tf.keras.layers.Dense(50, use_bias=True, activation='relu')(lstm2)
    logits = tf.keras.layers.Dense(len(ACTIONS), 
                                   #bias_initializer=tf.constant_initializer(value=[7.,0.1,0.1]), 
                                   use_bias=True)(lstm7)

    
     
    
return logits 

def main(args):
    args_dict = vars(args)
    print('args: {}'.format(args_dict))

    with tf.Graph().as_default() as g:
        # rollout subgraph

        with tf.device('/cpu:0'):
            with tf.name_scope('rollout'):

                observations = tf.placeholder(shape=(args.batch_size, args.sequence_size, OBSERVATION_DIM), dtype=tf.float32)

                logits = build_graph(observations)


                logits_for_sampling = tf.reshape(logits, shape=(args.batch_size, len(ACTIONS)))


                # Sample the action to be played during rollout.

                sample_action = tf.squeeze(tf.multinomial(logits=logits_for_sampling, num_samples=1))

            optimizer = tf.train.RMSPropOptimizer(
                learning_rate=args.learning_rate,
                decay=args.decay
            )

        # dataset subgraph for experience replay
        with tf.name_scope('dataset'):
            # the dataset reads from MEMORY

            ds = tf.data.Dataset.from_generator(gen, output_types=(tf.float32, tf.int64, tf.float32))
            iterator = ds.make_one_shot_iterator()

        # training subgraph
        with tf.name_scope('train'):
            # the train_op includes getting a batch of data from the dataset, so we do not need to use a feed_dict when running the train_op.
            next_batch = iterator.get_next()

            global episode
            train_observations, labels, processed_rewards = next_batch
            episode=next_batch

            # This reuses the same weights in the rollout phase.
            train_observations.set_shape((args.batch_size, args.sequence_size, OBSERVATION_DIM))
            train_logits = build_graph(train_observations)

            cross_entropies = tf.nn.sparse_softmax_cross_entropy_with_logits(
                logits=train_logits,
                labels=labels
            )



            loss = tf.reduce_sum(processed_rewards * cross_entropies)

            global_step = tf.train.get_or_create_global_step()

            train_op = optimizer.minimize(loss, global_step=global_step)

        init = tf.global_variables_initializer()
        saver = tf.train.Saver(max_to_keep=args.max_to_keep)

`

@martin-gorner
Copy link
Contributor

I guess you are talking about the reinforcement learning sample.
What exactly did you modify in the code ?
Or are you saying that the code as it is on GitHub does not work ?

@yanchvlad
Copy link
Author

yanchvlad commented Sep 6, 2018

@martin-gorner Sorry about misunderstandings. Yes, I'm talking about reinforcement learning sample. I found out that non-changed sample doesn't work.
It happens because of different models are created by name_scope('train') and name_scope('roll_out'). The trained weights are not used in next roll-out operation when train operation is over. And all roll-out operations will always compute actions from non-trained NN. As I found out it happened because of build_graph(observations) call from different name_scopes. When I united these two name_scopes in one, everything worked as it could be (weights were shared and reused).

python 3.6
tensorflow 1.10.0 GPU ver

@dizcology
Copy link
Member

dizcology commented Sep 6, 2018

@yanchvlad Thanks for reporting the issue. This is a known issue that happened between TensorFlow versions 1.8 and 1.9, where the reuse behavior is different for tf.keras models.

For now my suggestion would be either of the following:

a. use TensorFlow version 1.8

or

b. rewrite the build_graph function to not use tf.keras.layers.

@martin-gorner
Copy link
Contributor

@yanchvlad if you make the changes before we do please send a pull req!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants