e2e-detection is a toolkit to help deep learning engineers test their PyTorch/TensorFlow models on NVIDIA Triton inference server using different inference engines.
Let's introduce the three-stage deployment pipeline. At first, deep learning scientists train their models through deep learning frameworks (TensorFlow/Pytorch). Second, trained models will be converted to inference-optimized formats (ONNX/TensorRT/OpenPPL/NCNN/MNN). Finally, the converted models will be deployed to Nvidia Triton server. We usually call Triton inference server and others inference engines because Triton is responsible for managing resources for n models using different engines. In Triton, inference engines also call backends.
- Too many engineering efforts in three-stage deployment.
- It is easy to meet dependency errors during deployment.
- A Dockerfile to build all testing environments automatically.
- Two shell scripts to convert and configure trained models automatically.
- A use case of real-world deployment.
As a deep learning engineer, I highly recommend you use pre-trained models from SenseTime-MMLab because the team is extremely active to develop advanced deep learning models for diverse tasks in video analytics (i.e., image classification
, object detection
, semantic segmentation
, text detection
, 3d object detection
, pose estimation
and video understanding based on action
).
- Test Pytorch Models on Nvidia Triton server using TensorRT inference engine: Faster RCNN, YOLOv3, DETR, Swin-Transformer.
- Test TensorFlow Models on Nvidia Triton server using TensorRT inference engine: EfficientDet-Dx.
- A use case of Triton inference server
Inference Engine | Support | Stable | Target Platform |
---|---|---|---|
Nvidia TensorRT |
✔️ | ✔️ | Nvidia GPU |
SenseTime-MMLab OpenPPL |
CPU/GPU/Mobile | ||
Tencent NCNN |
✔️ | Mobile CPU | |
Alibaba MNN |
TBD | ✔️ | Mobile CPU/GPU/NPU |
- Image Classification
- Object Detection
- Semantic Segmentation
- Text Detection
- 3D Object Detection
- Pose Estimation
- Video understanding
- Nvidia Triton
- send/receive http requests
- parse the results
- run the pipeline with a docker
- Pytorch
- converse PyTorch models
- test it with the inference engine
- deploy the optimized model on Triton
- TensorFlow
- converse PyTorch models
- test it with the inference engine
- deploy the optimized model on Triton