Skip to content

Conversation

@yousef-rafat
Copy link
Contributor

@yousef-rafat yousef-rafat commented Sep 5, 2025

Screenshot 2025-09-09 230818

@yousef-rafat yousef-rafat changed the title Add support to Higgsv2 + Autoregressive Generation Add support for Higgsv2 + Autoregressive Generation Sep 5, 2025
@Kosinkadink Kosinkadink added the Good PR This PR looks good to go, it needs comfy's final review. label Sep 18, 2025
@Kosinkadink Kosinkadink added the Core Core team dependency label Sep 30, 2025
Copy link
Collaborator

@Kosinkadink Kosinkadink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, and sorry it took so long to review! Comfy and I took a look today. There are some comments added, but here is a summary + extras:

  1. CUDA Graph stuff should be removed from the code if possible.
  2. comfy would prefer that the caches from transformers.cache_utils not be used, as he wants to have as little dependency on transformers as possible.
  3. Check if the llama tokenizer .json could be reused for the higgsv2 tokenizer since they might be identical.
  4. Torch over numpy wherever possible

While testing after creating the combined checkpoint file, I found a bug - if you try to run a workflow a second time by incrementing the seed, the Autoregressive Generation node does things for a bit but then ultimately throws this error:

!!! Exception during processing !!! 'StaticCache' object has no attribute 'layers'
Traceback (most recent call last):
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 496, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 315, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 289, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 277, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\nodes.py", line 1588, in generate
    return (auto_sample(self, model, input_ids, max_new_length, min_new_length, top_k, top_p, temperature, do_sample, seed = seed),)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\autoregressive_sampling.py", line 678, in auto_sample
    samples = node._cached_autoregressive_sampler.generate(main_input_ids, max_new_length, min_new_length, top_k, top_p, temperature, do_sample, seed=seed, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Kosinkadink\ComfyUI\venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\autoregressive_sampling.py", line 393, in generate
    result = self.model._sample(
             ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 1115, in _sample
    past_key_values, self.current_past_key_values_bucket = self._prepare_kv_cache(
                                                           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 1018, in _prepare_kv_cache
    self._copy_kv_cache(
  File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 983, in _copy_kv_cache
    from_layer = from_cache.layers[i]
                 ^^^^^^^^^^^^^^^^^
AttributeError: 'StaticCache' object has no attribute 'layers'```

Let me know if you have any questions/comments!


_NUM_WARMUP_ITERS = 2

class CUDAGraphRunner(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comfy wants all CUDA graph stuff removed from this PR - unless there is a clear performance benefit. If the torch.cuda.synchronize call is needed, something may be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a noticeable and clear performance boost from CUDA graphs from my tests. You can see that by forcibly enabling/disabling them in the init of the AutoRegressiveGeneration class.

The torch.cuda.synchronize calls were in the original implementation: https://github.com/boson-ai/higgs-audio/blob/main/boson_multimodal/model/higgs_audio/cuda_graph_runner.py
I think I could remove them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, comfy says that we can eventually just make CUDA graphs a general comfy feature, so we shouldn't implement this for a specific model right now

import warnings
from enum import Enum
from dataclasses import dataclass, fields
from transformers.cache_utils import StaticCache, DynamicCache, Cache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comfy would prefer if cache classes were not imported from transformers, so these likely need to use either some existing ComfyUI cache class or be rewritten.

return data

def apply_filter(self, data: torch.Tensor):
if data.is_cuda or self.use_fir:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldnt be separate code paths for CPU/GPU, if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIR filter does an FFT convolution, which benefits much from the gpu compared to a sequential algorithm like the IIR that benefits more from the cpu

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the difference?

def generate_coefficients(self):

A = 10**(self.G/40.0)
w0 = 2.0 * np.pi * (self.fc / self.rate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numpy code should be replaced with torch wherever possible

@Kosinkadink
Copy link
Collaborator

Another thing - when you create a checkpoint for these PRs, could you upload those to huggingface to make it simple to test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Core team dependency Good PR This PR looks good to go, it needs comfy's final review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants