Skip to content

Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false #4035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ivangarcia44
Copy link
Contributor

Currently the avg_pool2d PyTorch operation supports the cases where count_include_pad is true and false, but the avg_pool1d and avg_pool3d only the true case is supported (which is simpler).

The count_include_pad = false support for avg_pool2d was added by @AmosLewis in this change (reviewed by @rsuderman and @nirvedhmeshram) : #3235

In this change I generalized the logic added above. I also did some refactoring to the original code to reduce the size of the functions and to avoid redundancy when possible.

@sahas3 @dixinzhou @rafaelubal

@ivangarcia44
Copy link
Contributor Author

Hi @AmosLewis , @rsuderman ,

Could you please review this change when you get a chance?

The change is just to generalize the average pooling divisor computation when count_include_pad = false. This was implemented for 2 dimensions. This change makes it work for N dimensions, and does some refactoring to make the methods smaller.

Thanks,
Ivan

@ivangarcia44
Copy link
Contributor Author

Link to old PR: #4010

The review of @sahas3 can be found there.

@ivangarcia44
Copy link
Contributor Author

ivangarcia44 commented Feb 22, 2025

Including additional reviewers familiar with the Pooling.cpp file: @vivekkhandelwal1, @lingzhiz1998

In addition to @AmosLewis , @rsuderman, @sahas3, @dixinzhou, @rafaelubal

@ivangarcia44
Copy link
Contributor Author

Including @rafaelubalmw in addition to the other reviewers: @vivekkhandelwal1, @lingzhiz1998, @AmosLewis , @rsuderman, @sahas3, @dixinzhou, @rafaelubal

@ivangarcia44 ivangarcia44 changed the title Extend avg pool count include pad false to1d3d Extend avg pool count include pad false to 1d 3d cases Feb 27, 2025
@ivangarcia44 ivangarcia44 changed the title Extend avg pool count include pad false to 1d 3d cases Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count include pad equal to false case Feb 27, 2025
@ivangarcia44 ivangarcia44 changed the title Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count include pad equal to false case Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false Feb 27, 2025
@dixinzhou
Copy link
Contributor

Thanks for making the changes @ivangarcia44. The changes look good to me.

Copy link
Member

@sahas3 sahas3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sahas3 sahas3 merged commit 7b23a1f into llvm:main Mar 5, 2025
3 checks passed
@ivangarcia44 ivangarcia44 deleted the extendAvgPoolCountIncludePadFalseTo1d3d branch March 5, 2025 00:22
@amd-vivekag
Copy link

amd-vivekag commented Mar 7, 2025

Hi @ivangarcia44 ,

This change is causing a testcase failure: iree-test-suites/onnx_ops/onnx/node/generated/test_averagepool_2d_ceil/
Can you please fix it?

I've created an issue for this: #4079

Thanks Vivek

@ivangarcia44
Copy link
Contributor Author

Hi @amd-vivekag ,

I will take a look at this.

Thanks,
Ivan

b, kernelSizeIntValues, strideInts, paddingInts);
// AtenAvgPool2/3dOp has an optional divisor_override
// attribute while AtenAvgPool1dOp does not.
if constexpr (avgPoolDims > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this led to some downstream build failures on Windows: #4085

@vivekkhandelwal1
Copy link
Collaborator

Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame.

@ivangarcia44
Copy link
Contributor Author

Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame.

For the next time I will. I apologize for the inconvenience. I waited for the review for almost a month, and I thought it was enough time to merge. I sent reminders various times during that month. Can you please point where is the incorrect code flow and LLVM guidelines violations? The fixes in the latest change cover various numerical correctness issues that were present prior any of my changes that were uncovered by a new set of E2E tests I added.

@vivekkhandelwal1
Copy link
Collaborator

Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame.

For the next time I will. I apologize for the inconvenience. I waited for the review for almost a month, and I thought it was enough time to merge. I sent reminders various times during that month. Can you please point where is the incorrect code flow and LLVM guidelines violations? The fixes in the latest change cover various numerical correctness issues that were present prior any of my changes that were uncovered by a new set of E2E tests I added.

Even if I don't go in much depth then you can see that atleast in this part of the code

Value PoolSizeCalculator<NumOfDims>::getPoolSize(
OpBuilder &b, SmallVectorImpl<Value> &kernelSizeIntValues,
SmallVectorImpl<int64_t> &strideInts,
SmallVectorImpl<int64_t> &paddingInts) {
Value poolSize;
Value cstZero =
b.createOrFold<arith::ConstantOp>(location, b.getI64IntegerAttr(0));
for (int i = 0; i < NumOfDims; ++i) {
// See the link below for the PyTorch implementation where this is
// derived from:
// https://github.com/pytorch/pytorch/blob/4a6dfbe4806b361c43210dfd56db64c4097c66bb/aten/src/ATen/native/cpu/AvgPoolKernel.cpp#L78
// Dim below stands for spatial dimension. Prior to the February 2025
// change, these variables used "height" and "width" (or "h" and "w")
// in these intermediate variables instead of "Dim".
Value IndexODim =
b.create<linalg::IndexOp>(location,
/*value=*/DimSizeFromSumPoolType[i]);
Value ODim = castIndexToInt64(b, location, IndexODim);
Value DDim = b.createOrFold<arith::ConstantOp>(
location, b.getI64IntegerAttr(strideInts[i]));
Value PadDim = b.createOrFold<arith::ConstantOp>(
location, b.getI64IntegerAttr(paddingInts[i]));
Value ODimDDim = b.createOrFold<arith::MulIOp>(location, ODim, DDim);
Value IDim0 = b.createOrFold<arith::SubIOp>(location, ODimDDim, PadDim);
Value IDim = castIndexToInt64(b, location, InputSpatialDimValues[i]);
Value IDim0KDim =
b.createOrFold<arith::AddIOp>(location, IDim0, kernelSizeIntValues[i]);
Value IDimPadDim = b.createOrFold<arith::AddIOp>(location, IDim, PadDim);
Value IDim1 =
b.createOrFold<arith::MinSIOp>(location, IDim0KDim, IDimPadDim);
Value IDim0Clamped =
b.createOrFold<arith::MaxSIOp>(location, IDim0, cstZero);
Value IDim1Clamped = b.createOrFold<arith::MinSIOp>(location, IDim1, IDim);
Value IDim1_IDim0_Clamped =
b.createOrFold<arith::SubIOp>(location, IDim1Clamped, IDim0Clamped);
if (i == 0) {
poolSize = IDim1_IDim0_Clamped;
} else {
poolSize = b.createOrFold<arith::MulIOp>(location, poolSize,
IDim1_IDim0_Clamped);
}
}
return poolSize;
}
, the basic things like variable names do not adhere to the LLVM Coding guidelines.

I am not able to understand this piece of the code. It says createAvgPoolValueCountIncludePadFalseCase but returns even when that is the case, just because no padding is done, and then the next method is called createAvgPoolValueCountIncludePadTrueCase with a return, and then it's followed by another return, which can never be reached. If that's the case, then it should have been described there; I have to spend quite some time to understand how this kind of control flow works.

auto divisorOpResult = createAvgPoolValueCountIncludePadFalseCase(
countIncludePad, op, adaptor, rewriter, self, sumPool, outputTensor,
resultType, kernelSizeIntValues, strideInts, paddingInts, indexingMapsAvg,
iteratorTypesAvg);
if (divisorOpResult)
return *divisorOpResult;
return createAvgPoolValueCountIncludePadTrueCase(
op, adaptor, rewriter, self, sumPool, outputTensor, resultType,
kernelSizeIntValues, indexingMapsAvg, iteratorTypesAvg);
return success();

Also, you said in the PR description:

I also did some refactoring to the original code to reduce the size of the functions and to avoid redundancy when possible.

But did it make the flow simpler? I don't think so. It's good that you did that, but the flow of the code should be such that any other contributor willing to contribute should be able to understand the code flow without putting a lot of effort, which is clearly not the case here because of the complexity being introduced in the code flow. Also, this might have enabled some new tests, but it should also error out clearly on the cases not supported and should not result in downstream failure.

In the end, I would like to say that I would not want to get into a debate on this and instead focus on what can be done ahead. I have to write the things above since you asked for them. I can find out more issues in the patch, but neither do I like doing that, nor do I want to. I believe in constructive feedback, and you should take it in that spirit only, and the focus should be on how we can address these issues.

@ivangarcia44
Copy link
Contributor Author

ivangarcia44 commented Apr 29, 2025

Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame.

For the next time I will. I apologize for the inconvenience. I waited for the review for almost a month, and I thought it was enough time to merge. I sent reminders various times during that month. Can you please point where is the incorrect code flow and LLVM guidelines violations? The fixes in the latest change cover various numerical correctness issues that were present prior any of my changes that were uncovered by a new set of E2E tests I added.

Even if I don't go in much depth then you can see that atleast in this part of the code

Value PoolSizeCalculator<NumOfDims>::getPoolSize(
OpBuilder &b, SmallVectorImpl<Value> &kernelSizeIntValues,
SmallVectorImpl<int64_t> &strideInts,
SmallVectorImpl<int64_t> &paddingInts) {
Value poolSize;
Value cstZero =
b.createOrFold<arith::ConstantOp>(location, b.getI64IntegerAttr(0));
for (int i = 0; i < NumOfDims; ++i) {
// See the link below for the PyTorch implementation where this is
// derived from:
// https://github.com/pytorch/pytorch/blob/4a6dfbe4806b361c43210dfd56db64c4097c66bb/aten/src/ATen/native/cpu/AvgPoolKernel.cpp#L78
// Dim below stands for spatial dimension. Prior to the February 2025
// change, these variables used "height" and "width" (or "h" and "w")
// in these intermediate variables instead of "Dim".
Value IndexODim =
b.create<linalg::IndexOp>(location,
/*value=*/DimSizeFromSumPoolType[i]);
Value ODim = castIndexToInt64(b, location, IndexODim);
Value DDim = b.createOrFold<arith::ConstantOp>(
location, b.getI64IntegerAttr(strideInts[i]));
Value PadDim = b.createOrFold<arith::ConstantOp>(
location, b.getI64IntegerAttr(paddingInts[i]));
Value ODimDDim = b.createOrFold<arith::MulIOp>(location, ODim, DDim);
Value IDim0 = b.createOrFold<arith::SubIOp>(location, ODimDDim, PadDim);
Value IDim = castIndexToInt64(b, location, InputSpatialDimValues[i]);
Value IDim0KDim =
b.createOrFold<arith::AddIOp>(location, IDim0, kernelSizeIntValues[i]);
Value IDimPadDim = b.createOrFold<arith::AddIOp>(location, IDim, PadDim);
Value IDim1 =
b.createOrFold<arith::MinSIOp>(location, IDim0KDim, IDimPadDim);
Value IDim0Clamped =
b.createOrFold<arith::MaxSIOp>(location, IDim0, cstZero);
Value IDim1Clamped = b.createOrFold<arith::MinSIOp>(location, IDim1, IDim);
Value IDim1_IDim0_Clamped =
b.createOrFold<arith::SubIOp>(location, IDim1Clamped, IDim0Clamped);
if (i == 0) {
poolSize = IDim1_IDim0_Clamped;
} else {
poolSize = b.createOrFold<arith::MulIOp>(location, poolSize,
IDim1_IDim0_Clamped);
}
}
return poolSize;
}

, the basic things like variable names do not adhere to the LLVM Coding guidelines.
I am not able to understand this piece of the code. It says createAvgPoolValueCountIncludePadFalseCase but returns even when that is the case, just because no padding is done, and then the next method is called createAvgPoolValueCountIncludePadTrueCase with a return, and then it's followed by another return, which can never be reached. If that's the case, then it should have been described there; I have to spend quite some time to understand how this kind of control flow works.

auto divisorOpResult = createAvgPoolValueCountIncludePadFalseCase(
countIncludePad, op, adaptor, rewriter, self, sumPool, outputTensor,
resultType, kernelSizeIntValues, strideInts, paddingInts, indexingMapsAvg,
iteratorTypesAvg);
if (divisorOpResult)
return *divisorOpResult;
return createAvgPoolValueCountIncludePadTrueCase(
op, adaptor, rewriter, self, sumPool, outputTensor, resultType,
kernelSizeIntValues, indexingMapsAvg, iteratorTypesAvg);
return success();

Also, you said in the PR description:

I also did some refactoring to the original code to reduce the size of the functions and to avoid redundancy when possible.

But did it make the flow simpler? I don't think so. It's good that you did that, but the flow of the code should be such that any other contributor willing to contribute should be able to understand the code flow without putting a lot of effort, which is clearly not the case here because of the complexity being introduced in the code flow. Also, this might have enabled some new tests, but it should also error out clearly on the cases not supported and should not result in downstream failure.

In the end, I would like to say that I would not want to get into a debate on this and instead focus on what can be done ahead. I have to write the things above since you asked for them. I can find out more issues in the patch, but neither do I like doing that, nor do I want to. I believe in constructive feedback, and you should take it in that spirit only, and the focus should be on how we can address these issues.

The name of createAvgPoolValueCountIncludePadFalseCase could be confusing because as you mentioned there are more conditions under which it can exit. Renaming this method could help as it has evolved over time while I discovered and fixed numerical correctness issues I found with the new tests. I will update this name and will outline the condition from this function to make the control flow easier to read. The "return success();" statement at the end of the matchAndRewrite method is dead code. I will get rid of this. Although I would not consider any of these issues as making the control flow incorrect since none of them produce an incorrect numerical result.

I still think the number of lines in a function should be small. Various companies have this C++ coding standard, including Google (See "Write Short Functions" in this link: https://google.github.io/styleguide/cppguide.html#Function_Declarations_and_Definitions). In addition when functions/methods with a significant amount of control flow tend to have a high cyclomatic complexity. A function with more than 100 lines of code most likely has a CC above 20 when Microsoft and other companies have a threshold around 10 (see "The Magic Number" in https://learn.microsoft.com/en-us/visualstudio/code-quality/code-metrics-cyclomatic-complexity?view=vs-2022). A high CC can lead to unmaintainable code with potential numerical incorrectness bugs that are hard to debug and fix like the ones fixed in this pull request: #4144.

The original code (prior any of my changes) had way more than 100 lines of code, was hard to understand as you pointed above (in my old change set I just generalized the old algorithm, not modified it at its core), the variables did not conveyed any meaning and there was no comments explaining how the formula came about.

For the new PR (#4144) I had to rewrite it completely to fix an E2E numerical correctness bug I found as part of the new test suite. Now the code has more comments, the variables are better named, although the functions sizes still need more work but did not venture in further refactoring to avoid increasing the complexity of the change.

The reason why the IREE test failed with this change was because of a E2E test gap in the torch-mlir project. The IREE failed test is one of the tests I added in the new E2E test suite to avoid a situation like this in the future. The new E2E test suite reduces the testing gap of the operators, but does not eliminate it. I have found other bugs I fixed in the convolution operator due to lack of test coverage for them.

Please add add additional feedback in the PR below.

#4144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants