Skip to content

Commit

Permalink
Update stable diffusion benchmark for TensorRT EP (microsoft#16560)
Browse files Browse the repository at this point in the history
### Description

Add Stable Diffusion Text2Image pipelines of TensorRT EP and CUDA EP.
They can automatically export and optimize ONNX model, and create
ONNXRuntime session to use TensorRT EP or CUDA execution provider.

Add support for benchmarking TensorRT.

Add support of cuda graph. The feature is only supported in nightly
package right now.

Engine/Provider to test | command line
---- | ---
CUDA EP | `python benchmark.py -v 1.5`
CUDA EP with cuda graph | `python benchmark.py -v 1.5
--enable_cuda_graph`
TensorRT EP | `python benchmark.py -v 1.5 -r tensorrt`
TensorRT EP with cuda graph | `python benchmark.py -v 1.5 -r tensorrt
--enable_cuda_graph`
TensorRT | `python benchmark.py -v 1.5 -e tensorrt`

Add benchmark numbers of T4 GPU using CUDA 11.7, cuDNN 8.5, PyTorch
1.13.1+cu11.7, TensorRT 8.6.1, onnxruntime-gpu 1.15.1 (or
ort-nightly-gpu 1.16 for cuda graph).

TODO: add benchmark numbers of A100-80GB

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
  • Loading branch information
tianleiwu authored Jul 10, 2023
1 parent 2fd5e1c commit b8f6235
Show file tree
Hide file tree
Showing 9 changed files with 2,257 additions and 65 deletions.
4 changes: 4 additions & 0 deletions onnxruntime/python/tools/transformers/fusion_transpose.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ def fuse(
cast_children = self.model.get_children(cast_node, input_name_to_nodes)
if cast_children and len(cast_children) > 1:
return

if cast_node.input[0] not in output_name_to_node:
return

transpose_a = output_name_to_node[cast_node.input[0]]

if transpose_a.op_type != "Transpose":
Expand Down

Large diffs are not rendered by default.

Loading

0 comments on commit b8f6235

Please sign in to comment.