Skip to content

Error training inceptionv3 #226

@ssdevel

Description

@ssdevel

Hi, i want to train the inceptionv3 network. I use the following command:

python train.py --network inceptionv3 --prefix final\inception\new\ssd --finetune 1 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec

also i tried with adding the pretrained parameter. I renamed the files but i got this error:

Traceback (most recent call last):
File "train.py", line 149, in
tensorboard=args.tensorboard)
File "C:\Users\stefa\Desktop\mxnet-ssd-master\train\train_net.py", line 256, in train_net
exe = net.simple_bind(mx.cpu(), data=(1, 3, data_shape[0], data_shape[1]), label=(1, 1, 5), grad_req='null')
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\symbol\symbol.py", line 1519, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1, 3, 3, 512)
label: (1, 1, 5)
Error in operator conv_1_conv2d: [14:17:19] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\nn\convolution.cc:191: Check failed: dilated_ksize_y <= AddPad(dshape[2], param_.pad[0]) (3 vs. 1) kernel size exceed input

When i run this command:
python train.py --network inceptionv3 --prefix final\inception\ssd_inceptionv3_512 --begin-epoch 215 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec --pretrained final\inception\ssd_inceptionv3_512

I got this error:
Traceback (most recent call last):
File "train.py", line 149, in
tensorboard=args.tensorboard)
File "C:\Users\stefa\Desktop\mxnet-ssd-master\train\train_net.py", line 355, in train_net
monitor=monitor)
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\base_module.py", line 488, in fit
allow_missing=allow_missing, force_init=force_init)
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\module.py", line 309, in init_params
_impl(desc, arr, arg_params)
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\module.py", line 297, in _impl
cache_arr.copyto(arr)
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\ndarray\ndarray.py", line 1970, in copyto
return _internal._copyto(self, out=other)
File "", line 25, in _copyto
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet_ctypes\ndarray.py", line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\base.py", line 149, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [14:27:05] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\elemwise_op_common.h:123: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node at 0-th output: expected [126], got [12]

When i use this comand:

python train.py --network inceptionv3 --prefix final\inception\ssd_inceptionv3_512 --begin-epoch 215 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec

the model does not converge.

Can you help me?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions