Skip to content

Conversation

@comfy-ovum
Copy link

…cudnn for all AMD users)

To offset the substantial effects of #10302, this PR provides (and informs the user of) an environment variable that can be set to nullify the unilateral decision made in #10302 to disable cudNN for all AMD users.

It simply employs the standard pattern for such things:

        torch.backends.cudnn.enabled = os.environ.get("TORCH_AMD_CUDNN_ENABLED", "0").strip().lower() not in {
            "0", "off", "false", "disable", "disabled", "no"}
        if not torch.backends.cudnn.enabled:
            logging.info(
                "ComfyUI has set torch.backends.cudnn.enabled to False for better AMD performance. Set environment var TORCH_AMD_CUDDNN_ENABLED=1 to enable it again.")

Should #10302 be later removed it is still a useful additional to enhance configurability for AMD users.

AMD_RDNA2_AND_OLDER_ARCH = ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"]

try:
if is_amd():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need the is_amd() check here, the following nested logic applies only to amd cards.

Copy link

@sfinktah sfinktah Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(quickly double checks)... it is in there, oh wait... hmm.... how did that happen! Fixed now.

That reminds me that the RDNA2 cut-off point is somewhat arbitary, but not my code. RDNA2 VAE decoding certainly benefits just as much as RDNA3. Not sure how it is when you aren't using cobbled together Windows drivers though.

comfyanonymous and others added 17 commits October 26, 2025 20:23
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.

* Updated design using Tensor Subclasses

* Fix FP8 MM

* An actually functional POC

* Remove CK reference and ensure correct compute dtype

* Update unit tests

* ruff lint

* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.

* Updated design using Tensor Subclasses

* Fix FP8 MM

* An actually functional POC

* Remove CK reference and ensure correct compute dtype

* Update unit tests

* ruff lint

* Fix missing keys

* Rename quant dtype parameter

* Rename quant dtype parameter

* Fix unittests for CPU build
…nymous#10499)

In the case of --cache-none lazy and subgraph execution can cause
anything to be run multiple times per workflow. If that rerun nodes is
in itself a subgraph generator, this will crash for two reasons.

pending_subgraph_results[] does not cleanup entries after their use.
So when a pending_subgraph_result is consumed, remove it from the list
so that if the corresponding node is fully re-executed this misses
lookup and it fall through to execute the node as it should.

Secondly, theres is an explicit enforcement against dups in the
addition of subgraphs nodes as ephemerals to the dymprompt. Remove this
enforcement as the use case is now valid.
To enable this feature use: --fast pinned_memory
@alexheretic
Copy link
Contributor

lgtm, much nicer to have env control of this rather than maintaining patches 👍

comfyanonymous and others added 27 commits October 31, 2025 15:41
* ops: dont take an offload stream if you dont need one

* ops: prioritize mem transfer

The async offload streams reason for existence is to transfer from
RAM to GPU. The post processing compute steps are a bonus on the side
stream, but if the compute stream is running a long kernel, it can
stall the side stream, as it wait to type-cast the bias before
transferring the weight. So do a pure xfer of the weight straight up,
then do everything bias, then go back to fix the weight type and do
weight patches.
Updated help text for the --fast argument to clarify potential risks.
…us#10643)

Bring back qwen behavior to what it was before previous pr.
@sfinktah
Copy link

sfinktah commented Nov 5, 2025

This commit has become too messy. Resubmitting as ..... #10649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.