-
Notifications
You must be signed in to change notification settings - Fork 555
Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false #4035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend to 1D and 3D the torch to linalg lowering of the average pool operator with count_include_pad = false #4035
Conversation
Get latest changes
…ere count_include_pad = false
Hi @AmosLewis , @rsuderman , Could you please review this change when you get a chance? The change is just to generalize the average pooling divisor computation when count_include_pad = false. This was implemented for 2 dimensions. This change makes it work for N dimensions, and does some refactoring to make the methods smaller. Thanks, |
Including additional reviewers familiar with the Pooling.cpp file: @vivekkhandelwal1, @lingzhiz1998 In addition to @AmosLewis , @rsuderman, @sahas3, @dixinzhou, @rafaelubal |
Including @rafaelubalmw in addition to the other reviewers: @vivekkhandelwal1, @lingzhiz1998, @AmosLewis , @rsuderman, @sahas3, @dixinzhou, @rafaelubal |
Thanks for making the changes @ivangarcia44. The changes look good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @ivangarcia44 , This change is causing a testcase failure: iree-test-suites/onnx_ops/onnx/node/generated/test_averagepool_2d_ceil/ I've created an issue for this: #4079 Thanks Vivek |
Hi @amd-vivekag , I will take a look at this. Thanks, |
b, kernelSizeIntValues, strideInts, paddingInts); | ||
// AtenAvgPool2/3dOp has an optional divisor_override | ||
// attribute while AtenAvgPool1dOp does not. | ||
if constexpr (avgPoolDims > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this led to some downstream build failures on Windows: #4085
Honestly, I think this PR should not have been merged. Apart from having some breaking changes, it has other issues, like incorrect code flow, and it does not adhere to standard LLVM contribution guidelines. For future reference, I would suggest that for such a PR making significant changes, we should wait for the original authors of the code to respond within a definite time frame. |
For the next time I will. I apologize for the inconvenience. I waited for the review for almost a month, and I thought it was enough time to merge. I sent reminders various times during that month. Can you please point where is the incorrect code flow and LLVM guidelines violations? The fixes in the latest change cover various numerical correctness issues that were present prior any of my changes that were uncovered by a new set of E2E tests I added. |
Even if I don't go in much depth then you can see that atleast in this part of the code torch-mlir/lib/Conversion/TorchToLinalg/Pooling.cpp Lines 902 to 948 in 4fe75dd
I am not able to understand this piece of the code. It says torch-mlir/lib/Conversion/TorchToLinalg/Pooling.cpp Lines 1043 to 1054 in 4fe75dd
Also, you said in the PR description:
But did it make the flow simpler? I don't think so. It's good that you did that, but the flow of the code should be such that any other contributor willing to contribute should be able to understand the code flow without putting a lot of effort, which is clearly not the case here because of the complexity being introduced in the code flow. Also, this might have enabled some new tests, but it should also error out clearly on the cases not supported and should not result in downstream failure. In the end, I would like to say that I would not want to get into a debate on this and instead focus on what can be done ahead. I have to write the things above since you asked for them. I can find out more issues in the patch, but neither do I like doing that, nor do I want to. I believe in constructive feedback, and you should take it in that spirit only, and the focus should be on how we can address these issues. |
The name of createAvgPoolValueCountIncludePadFalseCase could be confusing because as you mentioned there are more conditions under which it can exit. Renaming this method could help as it has evolved over time while I discovered and fixed numerical correctness issues I found with the new tests. I will update this name and will outline the condition from this function to make the control flow easier to read. The "return success();" statement at the end of the matchAndRewrite method is dead code. I will get rid of this. Although I would not consider any of these issues as making the control flow incorrect since none of them produce an incorrect numerical result. I still think the number of lines in a function should be small. Various companies have this C++ coding standard, including Google (See "Write Short Functions" in this link: https://google.github.io/styleguide/cppguide.html#Function_Declarations_and_Definitions). In addition when functions/methods with a significant amount of control flow tend to have a high cyclomatic complexity. A function with more than 100 lines of code most likely has a CC above 20 when Microsoft and other companies have a threshold around 10 (see "The Magic Number" in https://learn.microsoft.com/en-us/visualstudio/code-quality/code-metrics-cyclomatic-complexity?view=vs-2022). A high CC can lead to unmaintainable code with potential numerical incorrectness bugs that are hard to debug and fix like the ones fixed in this pull request: #4144. The original code (prior any of my changes) had way more than 100 lines of code, was hard to understand as you pointed above (in my old change set I just generalized the old algorithm, not modified it at its core), the variables did not conveyed any meaning and there was no comments explaining how the formula came about. For the new PR (#4144) I had to rewrite it completely to fix an E2E numerical correctness bug I found as part of the new test suite. Now the code has more comments, the variables are better named, although the functions sizes still need more work but did not venture in further refactoring to avoid increasing the complexity of the change. The reason why the IREE test failed with this change was because of a E2E test gap in the torch-mlir project. The IREE failed test is one of the tests I added in the new E2E test suite to avoid a situation like this in the future. The new E2E test suite reduces the testing gap of the operators, but does not eliminate it. I have found other bugs I fixed in the convolution operator due to lack of test coverage for them. Please add add additional feedback in the PR below. |
Currently the avg_pool2d PyTorch operation supports the cases where count_include_pad is true and false, but the avg_pool1d and avg_pool3d only the true case is supported (which is simpler).
The count_include_pad = false support for avg_pool2d was added by @AmosLewis in this change (reviewed by @rsuderman and @nirvedhmeshram) : #3235
In this change I generalized the logic added above. I also did some refactoring to the original code to reduce the size of the functions and to avoid redundancy when possible.
@sahas3 @dixinzhou @rafaelubal