-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release fixes/fix nhwc support #3657
Open
BrianHarrisonAMD
wants to merge
11
commits into
release/rocm-rel-6.4
Choose a base branch
from
release-fixes/fix-NHWC-support
base: release/rocm-rel-6.4
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Release fixes/fix nhwc support #3657
BrianHarrisonAMD
wants to merge
11
commits into
release/rocm-rel-6.4
from
release-fixes/fix-NHWC-support
+3,502
−2,652
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* enable nhwc in ocl * disable nhwc for ocl kernel and only enable ocl kernel for nchw and variant = 2 * enable nhwc in bn infer, address review comments * fix clang format * fix hip tidy issue * hip cleanups * fix index for nchw and nhwc for MIOpenBatchNormFwdInferPerAct.cl * disable ck bn solver * bg/enable_nhwc_in_ocl : fix tensor initialization of driver. Make it similar to gtest * 3 test cases for backward spatial single and multiple * fix clang format * fix type issue * remove ncwh guard for fwd train multiple * Fix tolerance issue for bwd FP32 test * fix mix bwd var 2 * fixed clang format * place guard around fp32 * clang format * Enable fwd training variant 2 for NHWC * Enable support for variant 2 bwd (FP32 works, mixed precision fails) * Optimize memory accesses in fwd inference for NHWC * Fix timers in batch norm * Optimize variant 2 of bwd for NHWC * the kernels now support xlocalsize > 1, usually it's 64; * the "Final" kernels are launched with one workgroup in y dim; * variant 2 is enabled by default for NHWC and float because other variants are not optimized yet. * Optimize variant 2 of fwd for NHWC Same strategy as in bwd * Fix stash indices for bwd with fp16 and NHWC * Fix fwd variant 2 with fp16 * Use variant 2 as default for fwd with NHWC * Fix 3D case for NHWC * Fix fwd logic * Add restiction checking for bwd variant 2 * BnBwdTrainingSpatialSingle checks first if BnBwdTrainingSpatialMultiple is applicable (variant 2); * Change stash indices for NHWC and mixed precision; * Add restictions for c and h*w for stashing values correctly; * remove unused macro * clang format * Refactor fwd restriction checking Consistency with bwd and clearer logic * fix clang format * remove unused variable from fwd train ocl bn * [gTest] Integrate Per Activation (#3498) * Added miopenBNPerActivation for bn_fwd, bn_infer, bn_fwd_train * Update CPU Inference and Fwd Train * Unified interface of batchNormSpatialHostBwdTrain, and batchNormPerActHostBwdTrain * Update dx_out data type --------- Co-authored-by: Bibek Ghimire <[email protected]> * add comments * address review comments partially * Fix conditions for 2d reduction, gcn_reduce2 was never used enen when possible * Unify stash index calculation * Reuse stash indices fwd needs 0,1, bwd This is possible because the new Final kernels run with one workgroup in y dim, so there are no load/store conflicts between workgroups. * Use correct out tensor in the driver for bwd per act * fix bn driver lamda init * clang format * add more coverage and few cleanups * add bn edge test cases * use double as accumulator and fix minor issues * fix bn driver for actiation and cleanups * fix driver's activation issue --------- Co-authored-by: xinlipn <[email protected]> Co-authored-by: Enrico Degregori <[email protected]> Co-authored-by: Anton Gorenko <[email protected]> Co-authored-by: BrianHarrisonAMD <[email protected]>
* undo code change and fix issue from cmake * seperate large tensor test in batch norm to run serially
* undo code change and fix issue from cmake * seperate large tensor test in batch norm to run serially * follow up review comments
* Vectorization fwd variant 2 for NHWC and NCHW * Vectorization bwd variant 2 for NHWC and NCHW * Vectorization inference for NHWC and NCHW * Fix warning * Fwd and Bwd dispatch with vectorization (NHWC) * fwd train variant 2: remove barrier(GLOBAL_MEM) (not needed) * Remove some restrictions in variant 2 by using different stash methods Currently all tests with NHWC use variant 2 (both fwd and bwd) * Extract common for spatial multiple * Bug fix: vectorization not supported for inference activation * Add vectorization to inference per activation * clang format * Fix stashing for mixed NCHW when H*W is not even The address must be aligned by 32 bit for f32 mean/variance. * Use smaller workgroups when H*W is small * Fix clang format --------- Co-authored-by: Enrico Degregori <[email protected]>
* remove variant calculation in batch norm's network config * add more infor for network config
* fix bn tuning issue with CK batchnorm solvers * work around for CK's 1184 * make batch norm's driver default iteration same as conv driver * comment CK BN for now
* Add tuning for batch norm OpenCL kernels: - during tuning all applicable variants are run and the fastest is picked by the tunign system - spatial single and multiple are merged in a single solver * General improvements from review * Update submodule fin to point to develop branch * clang tidy clean ups * clang tidy stuff * fix fin interface test --------- Co-authored-by: Enrico Degregori <[email protected]>
* fix bn driver signed init * fix comment
bghimireamd
approved these changes
Mar 28, 2025
amd-jnovotny
approved these changes
Apr 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelog OK.
BradPepersAMD
approved these changes
Apr 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to go!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enable NHWC support for BN, and include performance fixes.