Skip to content

Test PyTorch/TensorFlow models on inference engines (TensorRT and Triton) in fewer lines of code.

Notifications You must be signed in to change notification settings

efficient-edge/e2e-detection

Repository files navigation

e2e-detection

e2e-detection is a toolkit to help deep learning engineers test their PyTorch/TensorFlow models on NVIDIA Triton inference server using different inference engines.

Let's introduce the three-stage deployment pipeline. At first, deep learning scientists train their models through deep learning frameworks (TensorFlow/Pytorch). Second, trained models will be converted to inference-optimized formats (ONNX/TensorRT/OpenPPL/NCNN/MNN). Finally, the converted models will be deployed to Nvidia Triton server. We usually call Triton inference server and others inference engines because Triton is responsible for managing resources for n models using different engines. In Triton, inference engines also call backends.

pipeline

Why do we need e2e-detection?

  • Too many engineering efforts in three-stage deployment.
  • It is easy to meet dependency errors during deployment.

What can we do?

  • A Dockerfile to build all testing environments automatically.
  • Two shell scripts to convert and configure trained models automatically.
  • A use case of real-world deployment.

As a deep learning engineer, I highly recommend you use pre-trained models from SenseTime-MMLab because the team is extremely active to develop advanced deep learning models for diverse tasks in video analytics (i.e., image classification Github stars, object detection Github stars, semantic segmentation Github stars, text detection Github stars, 3d object detection Github stars, pose estimation Github stars and video understanding based on action Github stars).

Tutorials

Inference Engines

Inference Engine Support Stable Target Platform
Nvidia TensorRT Github stars ✔️ ✔️ Nvidia GPU
SenseTime-MMLab OpenPPL Github stars CPU/GPU/Mobile
Tencent NCNN Github stars ✔️ Mobile CPU
Alibaba MNN Github stars TBD ✔️ Mobile CPU/GPU/NPU

Applications

  • Image Classification
  • Object Detection
  • Semantic Segmentation
  • Text Detection
  • 3D Object Detection
  • Pose Estimation
  • Video understanding

TODO

  1. Nvidia Triton
    • send/receive http requests
    • parse the results
    • run the pipeline with a docker
  2. Pytorch
    • converse PyTorch models
    • test it with the inference engine
    • deploy the optimized model on Triton
  3. TensorFlow
    • converse PyTorch models
    • test it with the inference engine
    • deploy the optimized model on Triton

References

  1. MMDetection: OpenMMLab Detection Toolbox and Benchmark
  2. OpenPPL: A primitive library for neural network
  3. Triton: An open-source inference serving software that streamlines AI inferencing
  4. MMDeploy: An open-source deep learning model deployment toolset

About

Test PyTorch/TensorFlow models on inference engines (TensorRT and Triton) in fewer lines of code.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published