-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Add support for Higgsv2 + Autoregressive Generation #9736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, and sorry it took so long to review! Comfy and I took a look today. There are some comments added, but here is a summary + extras:
- CUDA Graph stuff should be removed from the code if possible.
- comfy would prefer that the caches from transformers.cache_utils not be used, as he wants to have as little dependency on transformers as possible.
- Check if the llama tokenizer .json could be reused for the higgsv2 tokenizer since they might be identical.
- Torch over numpy wherever possible
While testing after creating the combined checkpoint file, I found a bug - if you try to run a workflow a second time by incrementing the seed, the Autoregressive Generation node does things for a bit but then ultimately throws this error:
!!! Exception during processing !!! 'StaticCache' object has no attribute 'layers'
Traceback (most recent call last):
File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 496, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 315, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 289, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 277, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\nodes.py", line 1588, in generate
return (auto_sample(self, model, input_ids, max_new_length, min_new_length, top_k, top_p, temperature, do_sample, seed = seed),)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\comfy\autoregressive_sampling.py", line 678, in auto_sample
samples = node._cached_autoregressive_sampler.generate(main_input_ids, max_new_length, min_new_length, top_k, top_p, temperature, do_sample, seed=seed, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\Kosinkadink\ComfyUI\venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\comfy\autoregressive_sampling.py", line 393, in generate
result = self.model._sample(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 1115, in _sample
past_key_values, self.current_past_key_values_bucket = self._prepare_kv_cache(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 1018, in _prepare_kv_cache
self._copy_kv_cache(
File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 983, in _copy_kv_cache
from_layer = from_cache.layers[i]
^^^^^^^^^^^^^^^^^
AttributeError: 'StaticCache' object has no attribute 'layers'```
Let me know if you have any questions/comments!
|
|
||
| _NUM_WARMUP_ITERS = 2 | ||
|
|
||
| class CUDAGraphRunner(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comfy wants all CUDA graph stuff removed from this PR - unless there is a clear performance benefit. If the torch.cuda.synchronize call is needed, something may be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a noticeable and clear performance boost from CUDA graphs from my tests. You can see that by forcibly enabling/disabling them in the init of the AutoRegressiveGeneration class.
The torch.cuda.synchronize calls were in the original implementation: https://github.com/boson-ai/higgs-audio/blob/main/boson_multimodal/model/higgs_audio/cuda_graph_runner.py
I think I could remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, comfy says that we can eventually just make CUDA graphs a general comfy feature, so we shouldn't implement this for a specific model right now
| import warnings | ||
| from enum import Enum | ||
| from dataclasses import dataclass, fields | ||
| from transformers.cache_utils import StaticCache, DynamicCache, Cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comfy would prefer if cache classes were not imported from transformers, so these likely need to use either some existing ComfyUI cache class or be rewritten.
| return data | ||
|
|
||
| def apply_filter(self, data: torch.Tensor): | ||
| if data.is_cuda or self.use_fir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldnt be separate code paths for CPU/GPU, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIR filter does an FFT convolution, which benefits much from the gpu compared to a sequential algorithm like the IIR that benefits more from the cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big is the difference?
| def generate_coefficients(self): | ||
|
|
||
| A = 10**(self.G/40.0) | ||
| w0 = 2.0 * np.pi * (self.fc / self.rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numpy code should be replaced with torch wherever possible
|
Another thing - when you create a checkpoint for these PRs, could you upload those to huggingface to make it simple to test? |
Uh oh!
There was an error while loading. Please reload this page.