You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I would like to use your project,but I got some trouble in setting "--policy feudal". I can directly run python train.py and it works normally with default "--policy lstm". But when I switch to add this parameter as python train.py --policy feudal, I got following output:
[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/contextlib.py", l
ine 88, in exit
next(self.gen)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok
_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Key global/FeUdal/worker/
rnn/basic_lstm_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "worker.py", line 174, in
tf.app.run()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "worker.py", line 166, in main
run(args, server)
File "worker.py", line 94, in run
with sv.managed_session(server.target, config=config) as sess, sess.as_defau
lt():
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/contextlib.py", l
ine 81, in enter
return next(self.gen)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/six
.py", line 686, in reraise
raise value
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/session_manager.py", line 273, in prepare_session
config=config)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/session_manager.py", line 205, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1560, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key global/FeUdal/worker/
rnn/basic_lstm_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
Caused by op 'save/RestoreV2_55', defined at:
File "worker.py", line 174, in
tf.app.run()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "worker.py", line 166, in main
run(args, server)
File "worker.py", line 50, in run
saver = FastSaver(variables_to_save)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1140, in init
self.build()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1172, in build
filename=self._filename)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 688, in build
restore_sequentially, reshape)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 407, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 247, in restore_op
[spec.tensor.dtype])[0])
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
dtypes=dtypes, name=name)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-
access
NotFoundError (see above for traceback): Key global/FeUdal/worker/rnn/ba[26/480]
_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dty
pe=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "worker.py", line 174, in \n tf.app.run()', 'File "/home/xunti
an2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/plat
form/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthro
ugh))', 'File "worker.py", line 166, in main\n run(args, server)', 'File "wor
ker.py", line 77, in run\n ready_op=tf.report_uninitialized_variables(variabl
es_to_save),', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/sit
e-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped\n r
eturn _add_should_use_warning(fn(*args, **kwargs))', 'File "/home/xuntian2/anaco
nda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/util/tf_shoul
d_use.py", line 144, in _add_should_use_warning\n wrapped = TFShouldUseWarni$
gWrapper(x)', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site
-packages/tensorflow/python/util/tf_should_use.py", line 101, in init\n s
tack = [s.strip() for s in traceback.format_stack()]']
==================================
[2018-04-19 22:01:29,676] ==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dty
pe=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "worker.py", line 174, in \n tf.app.run()', 'File "/home/xunti
an2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/plat
form/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthro
ugh))', 'File "worker.py", line 166, in main\n run(args, server)', 'File "wor
ker.py", line 77, in run\n ready_op=tf.report_uninitialized_variables(variabl
es_to_save),', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/sit
e-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped\n r
eturn _add_should_use_warning(fn(*args, **kwargs))', 'File "/home/xuntian2/anaco
nda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/util/tf_shoul
d_use.py", line 144, in _add_should_use_warning\n wrapped = TFShouldUseWarnin
gWrapper(x)', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site
-packages/tensorflow/python/util/tf_should_use.py", line 101, in init\n s
tack = [s.strip() for s in traceback.format_stack()]']
could you please tell me what is the problem? Thanks a lot.
The text was updated successfully, but these errors were encountered:
Hi, I would like to use your project,but I got some trouble in setting "--policy feudal". I can directly run
python train.py
and it works normally with default "--policy lstm". But when I switch to add this parameter aspython train.py --policy feudal
, I got following output:could you please tell me what is the problem? Thanks a lot.
The text was updated successfully, but these errors were encountered: