Enhanced Adaptive Average Pooling 2D Backward Kernel: Performance Improvements and Code Simplification #1658

chunhuanMeng · 2025-05-14T02:27:31Z

Refactors and enhances the adaptive_avg_pool2d_backward_kernel implementation in the src/ATen/native/xpu/sycl/AdaptiveAveragePooling2dKernels.cpp file. Key changes include removing redundant template parameters, adding a new kernel functor for channels-last memory format, and optimizing memory usage and thread configurations for better performance and maintainability.

Refactoring and Simplification:

Removed the is_channels_last template parameter from both AdaptiveAvgPool2dBwdKernelFunctor and AdaptiveAvgPool2dBwdSLMKernelFunctor, simplifying their implementations. This eliminates conditional logic based on memory format

New Kernel Functor:

Introduced AdaptiveAvgPool2dBwdSLMChannelsLastKernelFunctor, specifically designed to handle the channels-last memory format. This functor precomputes indices and pooling factors for efficient gradient computation, leveraging shared memory for intermediate storage.

Memory and Thread Optimization:

Added constants (XPU_MAX_THREADS, GROUP_STRIDE) and optimized thread group configurations to improve performance and reduce the number of groups launched.
Updated shared memory usage calculations and introduced logic to dynamically adjust thread configurations if memory limits are exceeded.

General Improvements:

Replaced hardcoded dimensions with dynamically calculated values (isizeH, isizeW, osizeH, osizeW) for better readability and maintainability.
Removed unused or redundant code.

Copilot

Pull Request Overview

This PR refactors the Adaptive Average Pooling 2D backward kernel to improve performance, simplify code logic, and add a new optimized kernel for channels-last format.

Removed the now redundant is_channels_last template parameter and its branches.
Introduced a new kernel (AdaptiveAvgPool2dBwdSLMKernelFunctorChannelLast) that leverages shared memory and group-based processing for enhanced performance.
Updated kernel launch configurations and added utility macros for standardized index calculations.

Copilot · 2025-05-14T02:29:52Z

src/ATen/native/xpu/sycl/AdaptiveAveragePooling2dKernels.cpp

+#define START_IND_INT(a, b, c) ((a * c) / b)
+#define END_IND_INT(a, b, c) (((a + 1) * c + b - 1) / b)
+
+#define XPU_MAX_THREADS 1024 // this is safe, in reality 256 is our limit


[nitpick] Consider clarifying the comment on XPU_MAX_THREADS to explain why 1024 is used despite the realistic limit being 256, to avoid future confusion for maintainers.

Copilot · 2025-05-14T02:29:53Z

src/ATen/native/xpu/sycl/AdaptiveAveragePooling2dKernels.cpp

-    grad_input = at::empty_like(input_, smf);
-  }
+template <typename index_t, typename scalar_t>
+struct AdaptiveAvgPool2dBwdSLMKernelFunctorChannelLast


[nitpick] It would be beneficial to add inline comments describing the strategy of shared memory caching and the layout calculation in this new channels-last kernel to help future readers understand the complex index and memory computations.

Copilot

Pull Request Overview

This PR refactors the Adaptive Average Pooling 2D backward kernel to improve performance and simplify the code by removing redundant paths and introducing a new kernel optimized for channels-last memory format. Key changes include:

Removal of the is_channels_last template parameter to streamline the kernel functors.
Addition of a new channels-last kernel (AdaptiveAvgPool2dBwdSLMKernelFunctorChannelLast) that leverages shared memory caching.
Dynamic kernel launch configuration adjustments that ensure shared memory limits are respected.

Comments suppressed due to low confidence (1)

src/ATen/native/xpu/sycl/AdaptiveAveragePooling2dKernels.cpp:440

[nitpick] Consider adding an inline comment explaining the rationale behind dynamically reducing max_threads in the do-while loop to aid clarity and future maintenance.

do { ... max_threads adjustment ... } while (!done && max_threads);

src/ATen/native/xpu/sycl/AdaptiveAveragePooling2dKernels.cpp

Co-authored-by: Copilot <[email protected]>

chunhuanMeng · 2025-05-14T07:06:05Z

dtype	op	shape	ChannelsLast	output_size	original	optimized
torch.bfloat16	adaptive_avg_pool2d_backward	(8, 512, 32, 32)	TRUE	(7, 7)	153.176	96.264
torch.float16	adaptive_avg_pool2d_backward	(8, 512, 32, 32)	TRUE	(7, 7)	151.984	96.392
torch.float32	adaptive_avg_pool2d_backward	(8, 512, 32, 32)	TRUE	(7, 7)	152.44	99.832
torch.bfloat16	adaptive_avg_pool2d_backward	(8, 256, 56, 56)	TRUE	(14, 14)	211.68	161.728
torch.float16	adaptive_avg_pool2d_backward	(8, 256, 56, 56)	TRUE	(14, 14)	210.32	160.368
torch.float32	adaptive_avg_pool2d_backward	(8, 256, 56, 56)	TRUE	(14, 14)	210.312	151.248

Update AdaptiveAveragePooling2dKernels.cpp

010e698

chunhuanMeng changed the title ~~Update AdaptiveAveragePooling2dKernels.cpp~~ Enhanced Adaptive Average Pooling 2D Backward Kernel: Performance Improvements and Code Simplification May 14, 2025

chunhuanMeng requested a review from Copilot May 14, 2025 02:29

Copilot AI reviewed May 14, 2025

View reviewed changes

chunhuanMeng added 2 commits May 14, 2025 10:33

Update AdaptiveAveragePooling2dKernels.cpp

1574931

add comments

14a21e9

chunhuanMeng requested a review from Copilot May 14, 2025 06:35

Copilot AI reviewed May 14, 2025

View reviewed changes

src/ATen/native/xpu/sycl/AdaptiveAveragePooling2dKernels.cpp Outdated Show resolved Hide resolved

chunhuanMeng and others added 2 commits May 14, 2025 14:37

Update src/ATen/native/xpu/sycl/AdaptiveAveragePooling2dKernels.cpp

1f45764

Co-authored-by: Copilot <[email protected]>

rename

821430f

chunhuanMeng added 2 commits May 20, 2025 10:40

Merge branch 'main' into meng_opt_adavg

8d077ee

fix ut

458b6a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced Adaptive Average Pooling 2D Backward Kernel: Performance Improvements and Code Simplification #1658

Enhanced Adaptive Average Pooling 2D Backward Kernel: Performance Improvements and Code Simplification #1658

chunhuanMeng commented May 14, 2025 •

edited

Loading

Copilot AI left a comment

Copilot AI May 14, 2025

Copilot AI May 14, 2025

Copilot AI left a comment

chunhuanMeng commented May 14, 2025

Enhanced Adaptive Average Pooling 2D Backward Kernel: Performance Improvements and Code Simplification #1658

Are you sure you want to change the base?

Enhanced Adaptive Average Pooling 2D Backward Kernel: Performance Improvements and Code Simplification #1658

Conversation

chunhuanMeng commented May 14, 2025 • edited Loading

Refactoring and Simplification:

New Kernel Functor:

Memory and Thread Optimization:

General Improvements:

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Copilot AI May 14, 2025

Choose a reason for hiding this comment

Copilot AI May 14, 2025

Choose a reason for hiding this comment

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

chunhuanMeng commented May 14, 2025

chunhuanMeng commented May 14, 2025 •

edited

Loading