关于RNNcell内部的variable sharing

Notebook “Tutorial_05 - An understandable example to implement Multi-LSTM for MNIST”有这样一段代码。
```
with tf.variable_scope('RNN'):
    for timestep in range(timestep_size):
        if timestep > 0:
            tf.get_variable_scope().reuse_variables()
```
这个issue是关于`tf.get_variable_scope().reuse_variables()`合理性的猜测。希望同博主一起讨论。

首先，我发现tensorflow新旧版本在定义RNNcell的`__call__`方法时有不同的处理。旧版本直接定义`__call__`方法，新版本则要先继承`_LayerRNNCell`再定义`call` 和`build` 方法（而非直接定义`__call__`）。

为何这么处理？个人认为，使用RNNcell分为两个步骤：第一，实例化一个RNNcell；第二，调用声明的RNNcell实例进行计算。定义`__call__`方法就是为了简化用RNNcell的实例进行运算时的API调用。另外，大部分关于variable sharing的考虑和决策都发生在第一步。

但是，在声明RNNcell时，我们只指定了`num_units`。而将inputs转换为state的运算，涉及到根据input的shape来声明一组tf Variable。根据1.4.0版本的implementation，这组tf Variable的声明并没写在`__init__`方法中。个人猜测，在第一次使用RNNcell的某个实例进行计算时，先调用该实例的build方法，根据input的shape声明所需的tf Variables，然后再调用该实例的call方法进行计算。而build方法似乎只执行一次。

那么就产生一个问题。假设我们有两个不同shape的inputs，分别传递给同一个RNNcell的实例做计算，会发生什么？当然这是题外话。

关于是否应该使用tf.get_variable_scope().reuse_variables()，个人认为至少在1.4.0中不必。因为代码的for loop中，我们是重复"call"一个已经声明的`mlstm_cell`，而不是每次循环都声明一个`mlstm_cell`。另外，将input转换为state所需的tf Variables在第一次call `mlstm_cell`时得到定义，后续的call应该会自动重复使用这组tf Variables，即build方法只执行一次。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

关于RNNcell内部的variable sharing #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

关于RNNcell内部的variable sharing #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions