Vectorization of util functions #38

borongyuan · 2018-12-15T08:50:20Z

The functions in utils.h are inefficient.
I noticed that it takes about 9ms to run cvImageToTensor() and preprocessInception() on TX2. In contrast to this, the inference time of SSD (half precision) for batchsize=1 is 27ms.
There is a triple loop in cvImageToTensor(). Is it possible to let the compiler do the auto-vectorization? Or we need to manually reimplement these functions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorization of util functions #38

Vectorization of util functions #38

borongyuan commented Dec 15, 2018

Vectorization of util functions #38

Vectorization of util functions #38

Comments

borongyuan commented Dec 15, 2018