Implement TTA Batch Processing to Improve Inference Speed by PengchengShi1220 · Pull Request #2153 · MIC-DKFZ/nnUNet

PengchengShi1220 · 2024-05-02T13:48:28Z

Summary:
Proposing the integration of Test Time Augmentation (TTA) with batch processing in nnUNet to enhance inference efficiency, particularly evident in larger 3D datasets. Demonstrated improvements of 5%-8% in speed with validated results on the AMOS2022 dataset.

Implementation Details:

Python Version: 3.11
PyTorch Version: 2.2.2+cu121
Model: [64, 160, 192] patch size on NVIDIA RTX 3090, 24GB VRAM.

Results:

Mirror Axes (0, 1, 2):

ID TTA No Batch (s) TTA Batch Size=2 (s) TTA Batch Size=4 (s) TTA Batch Size=8 (s)

amos_0247 51 49 48 49

amos_0111 80 76 74 75

amos_0173 129 122 119 120
Mirror Axes (0, 1):

ID TTA No Batch (s) TTA Batch Size=2 (s) TTA Batch Size=4 (s)

amos_0247 26 24 25

amos_0111 40 38 38

amos_0173 64 61 60
Mirror Axis (0):

ID TTA No Batch (s) TTA Batch Size=2 (s)

amos_0247 13 12

amos_0111 20 19

amos_0173 32 31

VRAM Usage:

Detailed VRAM Consumption by TTA Batch Size:

Batch Size VRAM (GB) - Axes (0, 1, 2) VRAM (GB) - Axes (0, 1) VRAM (GB) - Axis (0)

1 7.57 7.57 7.57

2 9.24 9.24 9.24

4 12.37 12.37 -

8 17.04 - -

Recommendations:

Use TTA Batch Size=4 for configurations with three Mirror Axes (0, 1, 2).
Use TTA Batch Size=2 for configurations with fewer Mirror Axes.

The TTA batch processing approach has been thoroughly tested on the AMOS2022 dataset, showing consistent results with the original setup.

FabianIsensee · 2024-05-03T11:47:20Z

Hi,
thanks for the contribution + extensive benchmarking! That helps a lot in seeing the value!
If you would like us to include this, please make it an optional parameter people can set when calling nnUNetv2_predict. This should also (like all the other parameters) be set in the init of the nnUNetPredictor class.
The reason I want this to be optional is twofold:

sometimes we just don't have the VRAM to justify doing that
in case of limited VRAM, there are other VRAM-hungry features (perform_everything_on_device) that are more impactful for inference throughput and should be prioritized over batching TTA

Best,
Fabian

PengchengShi1220 · 2024-05-03T14:48:42Z

Hi Fabian,

Thanks for your feedback! Based on your suggestions, I have now made the "use_batch_tta" an optional parameter in the nnUNetPredictor class, which can be controlled via the parser argument "disable_batch_tta". This allows users to opt-in or out of batch TTA based on their VRAM capacity and priorities.

Please let me know if further adjustments are required.

Best,
Pengcheng

FabianIsensee · 2026-04-14T10:39:16Z

Frankly given the limited speed improvement of allowing batched prediction I would prefer to keep things simple. Apologies for dragging this out so long. I appreciate the work you did! This was an important thing to try, even if the improvements were much smaller than one might anticipate.
Best,
Fabian

Implement TTA batch processing for improved inference efficiency

d675153

FabianIsensee self-assigned this May 2, 2024

PengchengShi1220 added 2 commits May 3, 2024 20:56

Merge branch 'MIC-DKFZ:master' into master

6572b62

Implement TTA batch processing for improved inference efficiency

9aee47e

Merge branch 'MIC-DKFZ:master' into master

bbc43bb

ykirchhoff mentioned this pull request Sep 17, 2024

nnunet inference speed up #2504

Closed

FabianIsensee closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement TTA Batch Processing to Improve Inference Speed#2153

Implement TTA Batch Processing to Improve Inference Speed#2153
PengchengShi1220 wants to merge 4 commits intoMIC-DKFZ:masterfrom
PengchengShi1220:master

PengchengShi1220 commented May 2, 2024

Uh oh!

FabianIsensee commented May 3, 2024

Uh oh!

PengchengShi1220 commented May 3, 2024

Uh oh!

FabianIsensee commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ID	TTA No Batch (s)	TTA Batch Size=2 (s)	TTA Batch Size=4 (s)	TTA Batch Size=8 (s)
amos_0247	51	49	48	49
amos_0111	80	76	74	75
amos_0173	129	122	119	120

ID	TTA No Batch (s)	TTA Batch Size=2 (s)	TTA Batch Size=4 (s)
amos_0247	26	24	25
amos_0111	40	38	38
amos_0173	64	61	60

ID	TTA No Batch (s)	TTA Batch Size=2 (s)
amos_0247	13	12
amos_0111	20	19
amos_0173	32	31

Batch Size	VRAM (GB) - Axes (0, 1, 2)	VRAM (GB) - Axes (0, 1)	VRAM (GB) - Axis (0)
1	7.57	7.57	7.57
2	9.24	9.24	9.24
4	12.37	12.37	-
8	17.04	-	-

Conversation

PengchengShi1220 commented May 2, 2024

Uh oh!

FabianIsensee commented May 3, 2024

Uh oh!

PengchengShi1220 commented May 3, 2024

Uh oh!

FabianIsensee commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants