-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Add environment variable to opt out of #10302 (forced disablement of … #10463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add environment variable to opt out of #10302 (forced disablement of … #10463
Conversation
…isablement of cudnn for all AMD users)
Added a warning message about the state of torch-directml.
| AMD_RDNA2_AND_OLDER_ARCH = ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"] | ||
|
|
||
| try: | ||
| if is_amd(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need the is_amd() check here, the following nested logic applies only to amd cards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(quickly double checks)... it is in there, oh wait... hmm.... how did that happen! Fixed now.
That reminds me that the RDNA2 cut-off point is somewhat arbitary, but not my code. RDNA2 VAE decoding certainly benefits just as much as RDNA3. Not sure how it is when you aren't using cobbled together Windows drivers though.
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint. * Updated design using Tensor Subclasses * Fix FP8 MM * An actually functional POC * Remove CK reference and ensure correct compute dtype * Update unit tests * ruff lint * Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint. * Updated design using Tensor Subclasses * Fix FP8 MM * An actually functional POC * Remove CK reference and ensure correct compute dtype * Update unit tests * ruff lint * Fix missing keys * Rename quant dtype parameter * Rename quant dtype parameter * Fix unittests for CPU build
…nymous#10499) In the case of --cache-none lazy and subgraph execution can cause anything to be run multiple times per workflow. If that rerun nodes is in itself a subgraph generator, this will crash for two reasons. pending_subgraph_results[] does not cleanup entries after their use. So when a pending_subgraph_result is consumed, remove it from the list so that if the corresponding node is fully re-executed this misses lookup and it fall through to execute the node as it should. Secondly, theres is an explicit enforcement against dups in the addition of subgraphs nodes as ephemerals to the dymprompt. Remove this enforcement as the use case is now valid.
To enable this feature use: --fast pinned_memory
|
lgtm, much nicer to have env control of this rather than maintaining patches 👍 |
* ops: dont take an offload stream if you dont need one * ops: prioritize mem transfer The async offload streams reason for existence is to transfer from RAM to GPU. The post processing compute steps are a bonus on the side stream, but if the compute stream is running a long kernel, it can stall the side stream, as it wait to type-cast the bias before transferring the weight. So do a pure xfer of the weight straight up, then do everything bias, then go back to fix the weight type and do weight patches.
Updated help text for the --fast argument to clarify potential risks.
…put of Rodin3D nodes (comfyanonymous#10556)
…us#10643) Bring back qwen behavior to what it was before previous pr.
…nk-amd-cudnn-envvar
|
This commit has become too messy. Resubmitting as ..... #10649 |
…cudnn for all AMD users)
To offset the substantial effects of #10302, this PR provides (and informs the user of) an environment variable that can be set to nullify the unilateral decision made in #10302 to disable cudNN for all AMD users.
It simply employs the standard pattern for such things:
Should #10302 be later removed it is still a useful additional to enhance configurability for AMD users.