-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release and allocate VGPR resoures in tail loop. #1586
Conversation
4e58993
to
a053a82
Compare
05127bf
to
33928c3
Compare
7f6c281
to
4f7c8cd
Compare
7fb8dc8
to
ffa0a78
Compare
[----------] Global test environment tear-down |
b9338c4
to
4136665
Compare
aa61bb4
to
5bd0337
Compare
================== 9 passed, 112 skipped in 576.73s (0:09:36) ================== |
0504efd
to
5b8573e
Compare
self.states.lastValuAB - self.states.a.startVgprValu, "ValuAB") # Add as available | ||
module.addComment1("Tail: add ValuA/B vgpr buffer [%u...%u) to pool" % \ | ||
(self.states.a.startVgprValu, self.states.a.startVgprValu+self.states.lastValuAB)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Check out VGPR for G2l | ||
moduleMacroG2lVgpr, vgprG2L = self.tailLoopAllocG2LVgpr(kernel) | ||
module.add(moduleMacroG2lVgpr) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isOptNLL: | ||
vbegin = self.states.startVgprMisc | ||
vsize = self.states.lastVgprForReads - vbegin | ||
if self.states.a.startVgprLocalReadAddr > self.states.lastVgprForReads: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In OptNLL, we can release all of resources.
self.vgprPool.add(vbegin, vsize, "endSummation") | ||
module.addComment0("endSummation: add vgpr [%u...%u) to pool" % \ | ||
(vbegin, vbegin+vsize)) | ||
vbegin = self.states.bias.startVgprValu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for BiasSum Valu VGPR should be used when endSum but is released in the beginning of tail loop.
|
||
# GlobalRead, LocalWrite, LocalRead, G2L can be reclaimed, extend the "lastVgprForReads" value | ||
if kernel["PrefetchGlobalRead"]: | ||
self.states.lastVgprForReads = vgprIdx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for BiasSum Valu VGPR should be used when endSum but is released in the beginning of tail loop. It is a part of lastVgprForReads so that it may be temporally add back to the pool and used before StoreToLds. @KKyang
The 8e1801b passed CI. The 5b85731 is only for remove an incorrect comment. |
@@ -67,7 +67,11 @@ def _replaceActBranchLabel(module, labels): | |||
for item in module.items(): | |||
if isinstance(item, Module): | |||
if "InsertActFunctionCallAddrCalc" in item.name: | |||
labelLeft = labels[1:] | |||
labelFirst = labels[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for _replaceActBranchLabel() always replaces label without postfix.
However we should check the label we'd really like to replace with. @KKyang
tpA = self.states.bpr if bpeMax * vwa < self.states.bpr else bpeMax * vwa | ||
tpALocal = self.states.bpr if tensorParametersA["bpe"] * vwa < self.states.bpr else tensorParametersA["bpe"] * vwa | ||
# This check is to reserve porential usage of VGPRs for gfx12 8-bit code gen | ||
# We should optimize the usage for better performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the gfx12 may have potential VGPR usage issue. @cmingch
In general this condition should not include HasWMMA_V1.
localWriteCVTCode.add(VCvtF16toF32(dst=vgpr(vgprTmp+f16Tobf16Idx), src=vgpr(destVgprPrefix + "+%u"%(f16Tobf16Idx)))) | ||
localWriteCVTCode.add(VCvtF16toF32(dst=vgpr(vgprTmp+1+f16Tobf16Idx), src=vgpr(destVgprPrefix + "+%u"%(f16Tobf16Idx)),sdwa=sdwa)) | ||
localWriteCVTCode.add(VPackF16toB32(dst=vgpr(destVgprPrefix + "+%u"%(f16Tobf16Idx)), src0=vgpr(vgprTmp+f16Tobf16Idx), src1=vgpr(vgprTmp+1+f16Tobf16Idx), | ||
localWriteCVTCode.add(VCvtF16toF32(dst=vgpr(vgprTmp), src=vgpr(destVgprPrefix + "+%u"%(f16Tobf16Idx)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for Fix LocalWrite vgpr checkout and index bug.
1. Add a VGPR base index definition. 2. Rearange VGPR index order for further optimization. 3. Re-allocate VGPR for tail loop. Fix 6 potential bugs: 1. DTV will use the Valu VGPRs which is released in the beginning of tail loop. 2. BiasSum Valu VGPR should be used when endSum but is released in the beginning of tail loop. 3. _replaceActBranchLabel() always replaces label without postfix. However we should check the label we'd really like to replace with. 4. For DTL, numVgprG2LAllocated is not set so that it will be default=-1. 5. Fix G2LA vgpr allocation bug for navi3x. 6. Fix LocalWrite vgpr index bug. This change is the first step to optimize the VGPR usage in unroll loop. In general, the VGPRs usage in the unrolled loop is dependent from the tail. In tail, the VGPR can be used more effectively.
Fix 6 potential bugs:
However we should check the label we'd really like to replace with.
default=-1.