Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run demo with batch size 8 #10

Open
roxanneluo opened this issue Feb 27, 2020 · 5 comments
Open

Can't run demo with batch size 8 #10

roxanneluo opened this issue Feb 27, 2020 · 5 comments

Comments

@roxanneluo
Copy link

Hi I was trying to run the demo
python demos/demo_v2d.py --model=models/scannet.ckpt --sequence=data/demos/scannet_0
But got the following error

2020-02-27 14:07:27.062479: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
2020-02-27 14:07:27.062517: E tensorflow/stream_executor/cuda/cuda_blas.cc:2574] Internal: failed BLAS call, see log for details
Traceback (most recent call last):   
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
         [[{{node motion/PnP/einsum_1/MatMul}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape
_1)]]
         [[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", 
send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):   
  File "demos/demo_v2d.py", line 82, in <module>
    main(args)
  File "demos/demo_v2d.py", line 64, in main
    depths, poses = deepv2d(images, intrinsics, viz=True, iters=args.n_iters)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 462, in __call__
    self.update_poses(i)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 368, in update_poses
    self.poses, self.intrinsics, self.weights = self.sess.run(outputs, feed_dict=feed_dict)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
         [[node motion/PnP/einsum_1/MatMul (defined at /projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py:49)  = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job
:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape_1)]]
         [[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", 
send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'motion/PnP/einsum_1/MatMul', defined at:
  File "demos/demo_v2d.py", line 82, in <module>
    main(args)
  File "demos/demo_v2d.py", line 55, in main
    deepv2d = DeepV2D(cfg, args.model, use_fcrn=args.fcrn, is_calibrated=is_calibrated, mode=args.mode)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 68, in __init__
    self._build_motion_graph()
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 129, in _build_motion_graph
    images, depths, intrinsics, edge_inds, init=do_init)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/modules/motion.py", line 282, in forward
    Tij = Tij.keyframe_optim(target, weight, depths, intrinsics)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/geometry/transformation.py", line 364, in keyframe_optim
    J = einsum('...ij,...jk->...ik', jproj, jtran)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py", line 49, in einsum
    out = tf.einsum(equation, *inputs)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/special_math_ops.py", line 257, in einsum
    axes_to_sum)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/special_math_ops.py", line 389, in _einsum_reduction
    product = math_ops.matmul(t0, t1)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2019, in matmul
    a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1245, in batch_mat_mul
    "BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
         [[node motion/PnP/einsum_1/MatMul (defined at /projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py:49)  = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape_1)]]
         [[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

My environment setup is python 3.6.7, tensorflow-gpu 1.12.0
Seems that the problem is the batch size is too big. I have success when I only use 4 images. Can you help?

@zachteed
Copy link
Collaborator

zachteed commented Mar 2, 2020

Looks like cuda error, I don't think batch size should matter in this case. What GPU are you using?

@duanzhimin14
Copy link

I also have this proiblem ,my cuda is 9.0,tensorflow 1.12.0,how to slove?

@apxlwl
Copy link

apxlwl commented Aug 25, 2020

@zachteed same problem, do you have any solution? Or which CUDA version is required?

@Willyzw
Copy link

Willyzw commented Feb 17, 2021

Same issue for me. After some googling, it seems to have something to do with the special combination of TensorFlow 1.12 + RTX 2080. So after upgrading TensorFlow from 1.12.0 to 1.14.0 along with CUDA 10.0, it finally works for me :)

@zlj-cs
Copy link

zlj-cs commented Feb 24, 2023

Same issue for me. After some googling, it seems to have something to do with the special combination of TensorFlow 1.12 + RTX 2080. So after upgrading TensorFlow from 1.12.0 to 1.14.0 along with CUDA 10.0, it finally works for me :)
works for me too, many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants