Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false #4035

ivangarcia44 · 2025-02-20T13:41:55Z

Currently the avg_pool2d PyTorch operation supports the cases where count_include_pad is true and false, but the avg_pool1d and avg_pool3d only the true case is supported (which is simpler).

The count_include_pad = false support for avg_pool2d was added by @AmosLewis in this change (reviewed by @rsuderman and @nirvedhmeshram) : #3235

In this change I generalized the logic added above. I also did some refactoring to the original code to reduce the size of the functions and to avoid redundancy when possible.

@sahas3 @dixinzhou @rafaelubal

Get latest changes

…ere count_include_pad = false

ivangarcia44 · 2025-02-20T14:44:26Z

Hi @AmosLewis , @rsuderman ,

Could you please review this change when you get a chance?

The change is just to generalize the average pooling divisor computation when count_include_pad = false. This was implemented for 2 dimensions. This change makes it work for N dimensions, and does some refactoring to make the methods smaller.

Thanks,
Ivan

ivangarcia44 · 2025-02-20T17:11:05Z

Link to old PR: #4010

The review of @sahas3 can be found there.

ivangarcia44 · 2025-02-22T12:47:30Z

Including additional reviewers familiar with the Pooling.cpp file: @vivekkhandelwal1, @lingzhiz1998

In addition to @AmosLewis , @rsuderman, @sahas3, @dixinzhou, @rafaelubal

ivangarcia44 · 2025-02-26T17:12:00Z

Including @rafaelubalmw in addition to the other reviewers: @vivekkhandelwal1, @lingzhiz1998, @AmosLewis , @rsuderman, @sahas3, @dixinzhou, @rafaelubal

dixinzhou · 2025-02-27T16:44:42Z

Thanks for making the changes @ivangarcia44. The changes look good to me.

sahas3

LGTM

amd-vivekag · 2025-03-07T10:57:18Z

Hi @ivangarcia44 ,

This change is causing a testcase failure: iree-test-suites/onnx_ops/onnx/node/generated/test_averagepool_2d_ceil/
Can you please fix it?

I've created an issue for this: #4079

Thanks Vivek

ivangarcia44 · 2025-03-07T11:41:33Z

Hi @amd-vivekag ,

I will take a look at this.

Thanks,
Ivan

ScottTodd · 2025-03-10T15:46:27Z

lib/Conversion/TorchToLinalg/Pooling.cpp

+                    b, kernelSizeIntValues, strideInts, paddingInts);
+                // AtenAvgPool2/3dOp has an optional divisor_override
+                // attribute while AtenAvgPool1dOp does not.
+                if constexpr (avgPoolDims > 1) {


FYI this led to some downstream build failures on Windows: #4085

vivekkhandelwal1 · 2025-04-29T10:21:45Z

Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame.

ivangarcia44 · 2025-04-29T12:10:26Z

Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame.

For the next time I will. I apologize for the inconvenience. I waited for the review for almost a month, and I thought it was enough time to merge. I sent reminders various times during that month. Can you please point where is the incorrect code flow and LLVM guidelines violations? The fixes in the latest change cover various numerical correctness issues that were present prior any of my changes that were uncovered by a new set of E2E tests I added.

vivekkhandelwal1 · 2025-04-29T13:20:47Z

Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame.

For the next time I will. I apologize for the inconvenience. I waited for the review for almost a month, and I thought it was enough time to merge. I sent reminders various times during that month. Can you please point where is the incorrect code flow and LLVM guidelines violations? The fixes in the latest change cover various numerical correctness issues that were present prior any of my changes that were uncovered by a new set of E2E tests I added.

Even if I don't go in much depth then you can see that atleast in this part of the code

torch-mlir/lib/Conversion/TorchToLinalg/Pooling.cpp

Lines 902 to 948 in 4fe75dd

    
           Value PoolSizeCalculator<NumOfDims>::getPoolSize( 
        
               OpBuilder &b, SmallVectorImpl<Value> &kernelSizeIntValues, 
        
               SmallVectorImpl<int64_t> &strideInts, 
        
               SmallVectorImpl<int64_t> &paddingInts) { 
        
             Value poolSize; 
        
             Value cstZero = 
        
                 b.createOrFold<arith::ConstantOp>(location, b.getI64IntegerAttr(0)); 
        
             for (int i = 0; i < NumOfDims; ++i) { 
        
               // See the link below for the PyTorch implementation where this is 
        
               // derived from: 
        
               // https://github.com/pytorch/pytorch/blob/4a6dfbe4806b361c43210dfd56db64c4097c66bb/aten/src/ATen/native/cpu/AvgPoolKernel.cpp#L78 
        
               // Dim below stands for spatial dimension. Prior to the February 2025 
        
               // change, these variables used "height" and "width" (or "h" and "w") 
        
               // in these intermediate variables instead of "Dim". 
        
               Value IndexODim = 
        
                   b.create<linalg::IndexOp>(location, 
        
                                             /*value=*/DimSizeFromSumPoolType[i]); 
        
               Value ODim = castIndexToInt64(b, location, IndexODim); 
        
               Value DDim = b.createOrFold<arith::ConstantOp>( 
        
                   location, b.getI64IntegerAttr(strideInts[i])); 
        
               Value PadDim = b.createOrFold<arith::ConstantOp>( 
        
                   location, b.getI64IntegerAttr(paddingInts[i])); 
        
               Value ODimDDim = b.createOrFold<arith::MulIOp>(location, ODim, DDim); 
        
               Value IDim0 = b.createOrFold<arith::SubIOp>(location, ODimDDim, PadDim); 
        
               Value IDim = castIndexToInt64(b, location, InputSpatialDimValues[i]); 
        
               Value IDim0KDim = 
        
                   b.createOrFold<arith::AddIOp>(location, IDim0, kernelSizeIntValues[i]); 
        
               Value IDimPadDim = b.createOrFold<arith::AddIOp>(location, IDim, PadDim); 
        
               Value IDim1 = 
        
                   b.createOrFold<arith::MinSIOp>(location, IDim0KDim, IDimPadDim); 
        
               Value IDim0Clamped = 
        
                   b.createOrFold<arith::MaxSIOp>(location, IDim0, cstZero); 
        
               Value IDim1Clamped = b.createOrFold<arith::MinSIOp>(location, IDim1, IDim); 
        
               Value IDim1_IDim0_Clamped = 
        
                   b.createOrFold<arith::SubIOp>(location, IDim1Clamped, IDim0Clamped); 
        
               if (i == 0) { 
        
                 poolSize = IDim1_IDim0_Clamped; 
        
               } else { 
        
                 poolSize = b.createOrFold<arith::MulIOp>(location, poolSize, 
        
                                                          IDim1_IDim0_Clamped); 
        
               } 
        
             } 
        
             return poolSize; 
        
           }

, the basic things like variable names do not adhere to the LLVM Coding guidelines.

I am not able to understand this piece of the code. It says createAvgPoolValueCountIncludePadFalseCase but returns even when that is the case, just because no padding is done, and then the next method is called createAvgPoolValueCountIncludePadTrueCase with a return, and then it's followed by another return, which can never be reached. If that's the case, then it should have been described there; I have to spend quite some time to understand how this kind of control flow works.

torch-mlir/lib/Conversion/TorchToLinalg/Pooling.cpp

Lines 1043 to 1054 in 4fe75dd

    
           auto divisorOpResult = createAvgPoolValueCountIncludePadFalseCase( 
        
               countIncludePad, op, adaptor, rewriter, self, sumPool, outputTensor, 
        
               resultType, kernelSizeIntValues, strideInts, paddingInts, indexingMapsAvg, 
        
               iteratorTypesAvg); 
        
           if (divisorOpResult) 
        
             return *divisorOpResult; 
        
           return createAvgPoolValueCountIncludePadTrueCase( 
        
               op, adaptor, rewriter, self, sumPool, outputTensor, resultType, 
        
               kernelSizeIntValues, indexingMapsAvg, iteratorTypesAvg); 
        
           return success();

Also, you said in the PR description:

I also did some refactoring to the original code to reduce the size of the functions and to avoid redundancy when possible.

But did it make the flow simpler? I don't think so. It's good that you did that, but the flow of the code should be such that any other contributor willing to contribute should be able to understand the code flow without putting a lot of effort, which is clearly not the case here because of the complexity being introduced in the code flow. Also, this might have enabled some new tests, but it should also error out clearly on the cases not supported and should not result in downstream failure.

In the end, I would like to say that I would not want to get into a debate on this and instead focus on what can be done ahead. I have to write the things above since you asked for them. I can find out more issues in the patch, but neither do I like doing that, nor do I want to. I believe in constructive feedback, and you should take it in that spirit only, and the focus should be on how we can address these issues.

ivangarcia44 · 2025-04-29T18:59:55Z

Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame.

For the next time I will. I apologize for the inconvenience. I waited for the review for almost a month, and I thought it was enough time to merge. I sent reminders various times during that month. Can you please point where is the incorrect code flow and LLVM guidelines violations? The fixes in the latest change cover various numerical correctness issues that were present prior any of my changes that were uncovered by a new set of E2E tests I added.

Even if I don't go in much depth then you can see that atleast in this part of the code

torch-mlir/lib/Conversion/TorchToLinalg/Pooling.cpp

Lines 902 to 948 in 4fe75dd

Value PoolSizeCalculator<NumOfDims>::getPoolSize(

OpBuilder &b, SmallVectorImpl<Value> &kernelSizeIntValues,

SmallVectorImpl<int64_t> &strideInts,

SmallVectorImpl<int64_t> &paddingInts) {

Value poolSize;

Value cstZero =

b.createOrFold<arith::ConstantOp>(location, b.getI64IntegerAttr(0));

for (int i = 0; i < NumOfDims; ++i) {

// See the link below for the PyTorch implementation where this is

// derived from:

// https://github.com/pytorch/pytorch/blob/4a6dfbe4806b361c43210dfd56db64c4097c66bb/aten/src/ATen/native/cpu/AvgPoolKernel.cpp#L78

// Dim below stands for spatial dimension. Prior to the February 2025

// change, these variables used "height" and "width" (or "h" and "w")

// in these intermediate variables instead of "Dim".

Value IndexODim =

b.create<linalg::IndexOp>(location,

/*value=*/DimSizeFromSumPoolType[i]);

Value ODim = castIndexToInt64(b, location, IndexODim);

Value DDim = b.createOrFold<arith::ConstantOp>(

location, b.getI64IntegerAttr(strideInts[i]));

Value PadDim = b.createOrFold<arith::ConstantOp>(

location, b.getI64IntegerAttr(paddingInts[i]));

Value ODimDDim = b.createOrFold<arith::MulIOp>(location, ODim, DDim);

Value IDim0 = b.createOrFold<arith::SubIOp>(location, ODimDDim, PadDim);

Value IDim = castIndexToInt64(b, location, InputSpatialDimValues[i]);

Value IDim0KDim =

b.createOrFold<arith::AddIOp>(location, IDim0, kernelSizeIntValues[i]);

Value IDimPadDim = b.createOrFold<arith::AddIOp>(location, IDim, PadDim);

Value IDim1 =

b.createOrFold<arith::MinSIOp>(location, IDim0KDim, IDimPadDim);

Value IDim0Clamped =

b.createOrFold<arith::MaxSIOp>(location, IDim0, cstZero);

Value IDim1Clamped = b.createOrFold<arith::MinSIOp>(location, IDim1, IDim);

Value IDim1_IDim0_Clamped =

b.createOrFold<arith::SubIOp>(location, IDim1Clamped, IDim0Clamped);

if (i == 0) {

poolSize = IDim1_IDim0_Clamped;

} else {

poolSize = b.createOrFold<arith::MulIOp>(location, poolSize,

IDim1_IDim0_Clamped);

}

}

return poolSize;

}

, the basic things like variable names do not adhere to the LLVM Coding guidelines.
I am not able to understand this piece of the code. It says createAvgPoolValueCountIncludePadFalseCase but returns even when that is the case, just because no padding is done, and then the next method is called createAvgPoolValueCountIncludePadTrueCase with a return, and then it's followed by another return, which can never be reached. If that's the case, then it should have been described there; I have to spend quite some time to understand how this kind of control flow works.

torch-mlir/lib/Conversion/TorchToLinalg/Pooling.cpp

Lines 1043 to 1054 in 4fe75dd

auto divisorOpResult = createAvgPoolValueCountIncludePadFalseCase(

countIncludePad, op, adaptor, rewriter, self, sumPool, outputTensor,

resultType, kernelSizeIntValues, strideInts, paddingInts, indexingMapsAvg,

iteratorTypesAvg);

if (divisorOpResult)

return *divisorOpResult;

return createAvgPoolValueCountIncludePadTrueCase(

op, adaptor, rewriter, self, sumPool, outputTensor, resultType,

kernelSizeIntValues, indexingMapsAvg, iteratorTypesAvg);

return success();

Also, you said in the PR description:
I also did some refactoring to the original code to reduce the size of the functions and to avoid redundancy when possible.
But did it make the flow simpler? I don't think so. It's good that you did that, but the flow of the code should be such that any other contributor willing to contribute should be able to understand the code flow without putting a lot of effort, which is clearly not the case here because of the complexity being introduced in the code flow. Also, this might have enabled some new tests, but it should also error out clearly on the cases not supported and should not result in downstream failure.

In the end, I would like to say that I would not want to get into a debate on this and instead focus on what can be done ahead. I have to write the things above since you asked for them. I can find out more issues in the patch, but neither do I like doing that, nor do I want to. I believe in constructive feedback, and you should take it in that spirit only, and the focus should be on how we can address these issues.

The name of createAvgPoolValueCountIncludePadFalseCase could be confusing because as you mentioned there are more conditions under which it can exit. Renaming this method could help as it has evolved over time while I discovered and fixed numerical correctness issues I found with the new tests. I will update this name and will outline the condition from this function to make the control flow easier to read. The "return success();" statement at the end of the matchAndRewrite method is dead code. I will get rid of this. Although I would not consider any of these issues as making the control flow incorrect since none of them produce an incorrect numerical result.

I still think the number of lines in a function should be small. Various companies have this C++ coding standard, including Google (See "Write Short Functions" in this link: https://google.github.io/styleguide/cppguide.html#Function_Declarations_and_Definitions). In addition when functions/methods with a significant amount of control flow tend to have a high cyclomatic complexity. A function with more than 100 lines of code most likely has a CC above 20 when Microsoft and other companies have a threshold around 10 (see "The Magic Number" in https://learn.microsoft.com/en-us/visualstudio/code-quality/code-metrics-cyclomatic-complexity?view=vs-2022). A high CC can lead to unmaintainable code with potential numerical incorrectness bugs that are hard to debug and fix like the ones fixed in this pull request: #4144.

The original code (prior any of my changes) had way more than 100 lines of code, was hard to understand as you pointed above (in my old change set I just generalized the old algorithm, not modified it at its core), the variables did not conveyed any meaning and there was no comments explaining how the formula came about.

For the new PR (#4144) I had to rewrite it completely to fix an E2E numerical correctness bug I found as part of the new test suite. Now the code has more comments, the variables are better named, although the functions sizes still need more work but did not venture in further refactoring to avoid increasing the complexity of the change.

The reason why the IREE test failed with this change was because of a E2E test gap in the torch-mlir project. The IREE failed test is one of the tests I added in the new E2E test suite to avoid a situation like this in the future. The new E2E test suite reduces the testing gap of the operators, but does not eliminate it. I have found other bugs I fixed in the convolution operator due to lack of test coverage for them.

Please add add additional feedback in the PR below.

#4144

ivangarcia44 and others added 3 commits February 19, 2025 13:32

Merge pull request #2 from llvm/main

6ea4d11

Get latest changes

Extend the PyTorch avg_pool linalg lowering algorithm for the case wh…

b4081d8

…ere count_include_pad = false

Using C++ style encapsulation of methods, rather than C style.

0123524

ivangarcia44 mentioned this pull request Feb 20, 2025

Generalize the PyTorch avg_pool linalg lowering algorithm for the case where count_include_pad = false. #4010

Closed

ivangarcia44 changed the title ~~Extend avg pool count include pad false to1d3d~~ Extend avg pool count include pad false to 1d 3d cases Feb 27, 2025

ivangarcia44 changed the title ~~Extend avg pool count include pad false to 1d 3d cases~~ Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count include pad equal to false case Feb 27, 2025

ivangarcia44 changed the title ~~Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count include pad equal to false case~~ Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false Feb 27, 2025

Update comment for rebasing with main.

68be30f

sahas3 approved these changes Mar 5, 2025

View reviewed changes

sahas3 merged commit 7b23a1f into llvm:main Mar 5, 2025
3 checks passed

ivangarcia44 deleted the extendAvgPoolCountIncludePadFalseTo1d3d branch March 5, 2025 00:22

amd-vivekag mentioned this pull request Mar 7, 2025

Updating torch-mlir to llvm/torch-mlir@596b58e iree-org/iree#20188

Merged

renxida mentioned this pull request Mar 10, 2025

Pooling.cpp changes causes windows builds to fail downstream in iree-org/iree #4085

Closed

ScottTodd reviewed Mar 10, 2025

View reviewed changes

ScottTodd mentioned this pull request Mar 10, 2025

Fix constexpr usage in TorchToLinalg/Pooling.cpp for MSVC. #4086

Merged

vivekkhandelwal1 mentioned this pull request Apr 29, 2025

Average pooling clamped divisor should be done on all conditions where the kernel can go out of bounds #4144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false #4035

Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false #4035

Uh oh!

ivangarcia44 commented Feb 20, 2025

Uh oh!

ivangarcia44 commented Feb 20, 2025

Uh oh!

ivangarcia44 commented Feb 20, 2025

Uh oh!

ivangarcia44 commented Feb 22, 2025 •

edited

Loading

Uh oh!

ivangarcia44 commented Feb 26, 2025

Uh oh!

dixinzhou commented Feb 27, 2025

Uh oh!

sahas3 left a comment

Uh oh!

Uh oh!

amd-vivekag commented Mar 7, 2025 •

edited

Loading

Uh oh!

ivangarcia44 commented Mar 7, 2025

Uh oh!

ScottTodd Mar 10, 2025

Uh oh!

vivekkhandelwal1 commented Apr 29, 2025

Uh oh!

ivangarcia44 commented Apr 29, 2025

Uh oh!

vivekkhandelwal1 commented Apr 29, 2025

Uh oh!

ivangarcia44 commented Apr 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false #4035

Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false #4035

Uh oh!

Conversation

ivangarcia44 commented Feb 20, 2025

Uh oh!

ivangarcia44 commented Feb 20, 2025

Uh oh!

ivangarcia44 commented Feb 20, 2025

Uh oh!

ivangarcia44 commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivangarcia44 commented Feb 26, 2025

Uh oh!

dixinzhou commented Feb 27, 2025

Uh oh!

sahas3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amd-vivekag commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivangarcia44 commented Mar 7, 2025

Uh oh!

ScottTodd Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

vivekkhandelwal1 commented Apr 29, 2025

Uh oh!

ivangarcia44 commented Apr 29, 2025

Uh oh!

vivekkhandelwal1 commented Apr 29, 2025

Uh oh!

ivangarcia44 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ivangarcia44 commented Feb 22, 2025 •

edited

Loading

amd-vivekag commented Mar 7, 2025 •

edited

Loading

ivangarcia44 commented Apr 29, 2025 •

edited

Loading