Add environment variable to opt out of #10302 (forced disablement of … #10463

comfy-ovum · 2025-10-24T07:49:36Z

…cudnn for all AMD users)

To offset the substantial effects of #10302, this PR provides (and informs the user of) an environment variable that can be set to nullify the unilateral decision made in #10302 to disable cudNN for all AMD users.

It simply employs the standard pattern for such things:

        torch.backends.cudnn.enabled = os.environ.get("TORCH_AMD_CUDNN_ENABLED", "0").strip().lower() not in {
            "0", "off", "false", "disable", "disabled", "no"}
        if not torch.backends.cudnn.enabled:
            logging.info(
                "ComfyUI has set torch.backends.cudnn.enabled to False for better AMD performance. Set environment var TORCH_AMD_CUDDNN_ENABLED=1 to enable it again.")

Should #10302 be later removed it is still a useful additional to enhance configurability for AMD users.

…isablement of cudnn for all AMD users)

…re accurate)

Added a warning message about the state of torch-directml.

… for retries (comfyanonymous#10486)

alexheretic · 2025-10-26T16:53:45Z

comfy/model_management.py

 AMD_RDNA2_AND_OLDER_ARCH = ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"]

 try:
-    if is_amd():


I think we still need the is_amd() check here, the following nested logic applies only to amd cards.

(quickly double checks)... it is in there, oh wait... hmm.... how did that happen! Fixed now.

That reminds me that the RDNA2 cut-off point is somewhat arbitary, but not my code. RDNA2 VAE decoding certainly benefits just as much as RDNA3. Not sure how it is when you aren't using cobbled together Windows drivers though.

…ous#10493)

…10504)

…nonymous#10510)

…omfyanonymous#10518)

* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint. * Updated design using Tensor Subclasses * Fix FP8 MM * An actually functional POC * Remove CK reference and ensure correct compute dtype * Update unit tests * ruff lint * Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint. * Updated design using Tensor Subclasses * Fix FP8 MM * An actually functional POC * Remove CK reference and ensure correct compute dtype * Update unit tests * ruff lint * Fix missing keys * Rename quant dtype parameter * Rename quant dtype parameter * Fix unittests for CPU build

…nymous#10499) In the case of --cache-none lazy and subgraph execution can cause anything to be run multiple times per workflow. If that rerun nodes is in itself a subgraph generator, this will crash for two reasons. pending_subgraph_results[] does not cleanup entries after their use. So when a pending_subgraph_result is consumed, remove it from the list so that if the corresponding node is fully re-executed this misses lookup and it fall through to execute the node as it should. Secondly, theres is an explicit enforcement against dups in the addition of subgraphs nodes as ephemerals to the dymprompt. Remove this enforcement as the use case is now valid.

To enable this feature use: --fast pinned_memory

alexheretic · 2025-10-29T17:57:15Z

lgtm, much nicer to have env control of this rather than maintaining patches 👍

…omfyanonymous#10570)

* ops: dont take an offload stream if you dont need one * ops: prioritize mem transfer The async offload streams reason for existence is to transfer from RAM to GPU. The post processing compute steps are a bonus on the side stream, but if the compute stream is running a long kernel, it can stall the side stream, as it wait to type-cast the bias before transferring the weight. So do a pure xfer of the weight straight up, then do everything bias, then go back to fix the weight type and do weight patches.

Updated help text for the --fast argument to clarify potential risks.

…put of Rodin3D nodes (comfyanonymous#10556)

…anonymous#10622)

…nymous#10638)

…us#10643) Bring back qwen behavior to what it was before previous pr.

…nk-amd-cudnn-envvar

sfinktah · 2025-11-05T07:38:47Z

This commit has become too messy. Resubmitting as ..... #10649

sfinktah added 2 commits October 24, 2025 18:40

Add environment variable to opt out of comfyanonymous#10302 (forced d…

00184d1

…isablement of cudnn for all AMD users)

Replace TORCH_BACKENDS_CUDNN_ENABLED with TORCH_AMD_CUDNN_ENABLED (mo…

995c073

…re accurate)

comfy-ovum requested a review from Kosinkadink as a code owner October 24, 2025 07:49

sfinktah mentioned this pull request Oct 24, 2025

Improve AMD performance. #10302

Merged

bigcat88 and others added 6 commits October 24, 2025 15:48

convert Tripo API nodes to V3 schema (comfyanonymous#10469)

dd5af0c

Remove useless function (comfyanonymous#10472)

426cde3

convert Gemini API nodes to V3 schema (comfyanonymous#10476)

e86b79a

Add warning for torch-directml usage (comfyanonymous#10482)

098a352

Added a warning message about the state of torch-directml.

Fix mistake. (comfyanonymous#10484)

f6bbc1a

fix(api-nodes): random issues on Windows by capturing general OSError…

9d529e5

… for retries (comfyanonymous#10486)

alexheretic reviewed Oct 26, 2025

View reviewed changes

comfyanonymous and others added 17 commits October 26, 2025 20:23

Bump portable deps workflow to torch cu130 python 3.13.9 (comfyanonym…

c170fd2

…ous#10493)

Add a bat to run comfyui portable without api nodes. (comfyanonymous#…

601ee17

…10504)

Update template to 0.2.3 (comfyanonymous#10503)

c305dee

feat(api-nodes): add LTXV API nodes (comfyanonymous#10496)

55bad30

Update template to 0.2.4 (comfyanonymous#10505)

6abc30a

frontend bump to 1.28.8 (comfyanonymous#10506)

614b8d3

ComfyUI version v0.3.67

f2bb323

Bump stable portable to cu130 python 3.13.9 (comfyanonymous#10508)

b61a40c

Remove comfy api key from queue api. (comfyanonymous#10502)

8cf2ba4

Tell users to update nvidia drivers if problem with portable. (comfya…

3bea4ef

…nonymous#10510)

Tell users to update their nvidia drivers if portable doesn't start. (c…

22e40d2

…omfyanonymous#10518)

convert nodes_recraft.py to V3 schema (comfyanonymous#10507)

210f7a1

Speed up offloading using pinned memory. (comfyanonymous#10526)

3fa7a5c

To enable this feature use: --fast pinned_memory

Fix issue. (comfyanonymous#10527)

e525673

inserted missing is_amd() check

a4eb32a

use new API client in Luma and Minimax nodes (comfyanonymous#10528)

6c14f3a

comfyanonymous and others added 27 commits October 31, 2025 15:41

ScaleROPE now works on Lumina models. (comfyanonymous#10578)

7f374e4

Fix torch compile regression on fp8 ops. (comfyanonymous#10580)

c58c13b

added 12s-20s as available output durations for the LTXV API nodes (c…

5f109fe

…omfyanonymous#10570)

convert StabilityAI to use new API client (comfyanonymous#10582)

20182a3

Fix issue with pinned memory. (comfyanonymous#10597)

44869ff

Clarify help text for --fast argument (comfyanonymous#10609)

97ff9fa

Updated help text for the --fast argument to clarify potential risks.

fix(api-nodes-cloud): stop using sub-folder and absolute path for out…

6d6a18b

…put of Rodin3D nodes (comfyanonymous#10556)

fix(caching): treat bytes as hashable (comfyanonymous#10567)

88df172

convert nodes_hypernetwork.py to V3 schema (comfyanonymous#10583)

1f3f7a2

convert nodes_openai.py to V3 schema (comfyanonymous#10604)

e617cdd

feat(Pika-API-nodes): use new API client (comfyanonymous#10608)

4e2110c

chore: update embedded docs to v0.3.1 (comfyanonymous#10614)

e974e55

People should update their pytorch versions. (comfyanonymous#10618)

958a171

Speed up torch.compile (comfyanonymous#10620)

0652cb8

Fixes (comfyanonymous#10621)

e199c8c

Bring back fp8 torch compile performance to what it should be. (comfy…

6b88478

…anonymous#10622)

This seems to slow things down slightly on Linux. (comfyanonymous#10624)

0f4ef3a

More fp8 torch.compile regressions fixed. (comfyanonymous#10625)

af4b7b5

chore: update workflow templates to v0.2.11 (comfyanonymous#10634)

9c71a66

caching: Handle None outputs tuple case (comfyanonymous#10637)

a389ee0

Limit amount of pinned memory on windows to prevent issues. (comfyano…

7f3e4d4

…nymous#10638)

ComfyUI version v0.3.68

265adad

Use single apply_rope function across models (comfyanonymous#10547)

4cd8818

Lower ltxv mem usage to what it was before previous pr. (comfyanonymo…

c4a6b38

…us#10643) Bring back qwen behavior to what it was before previous pr.

Add env TORCH_AMD_CUDNN_ENABLED

58db886

Merge remote-tracking branch 'origin/sfink-amd-cudnn-envvar' into sfi…

175c22d

…nk-amd-cudnn-envvar

comfy-ovum mentioned this pull request Nov 5, 2025

Add env TORCH_AMD_CUDNN_ENABLED #10649

Open

comfy-ovum closed this Nov 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add environment variable to opt out of #10302 (forced disablement of … #10463

Add environment variable to opt out of #10302 (forced disablement of … #10463

Uh oh!

comfy-ovum commented Oct 24, 2025

Uh oh!

alexheretic Oct 26, 2025

Uh oh!

sfinktah Oct 29, 2025 •

edited

Loading

Uh oh!

alexheretic commented Oct 29, 2025

Uh oh!

sfinktah commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Add environment variable to opt out of #10302 (forced disablement of … #10463

Add environment variable to opt out of #10302 (forced disablement of … #10463

Uh oh!

Conversation

comfy-ovum commented Oct 24, 2025

Uh oh!

alexheretic Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

sfinktah Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexheretic commented Oct 29, 2025

Uh oh!

sfinktah commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

sfinktah Oct 29, 2025 •

edited

Loading

sfinktah commented Nov 5, 2025 •

edited

Loading