Skip to content

Conversation

IDGallagher
Copy link
Contributor

Some memory optimization ideas in this PR:

  • Image scheduling acts as a lookup table to get the embedded image corresponding to each sub_idx when using animate diff. This lets you input a small amount of images and specify when they're used rather than create a large batch of repeated images
  • There is a VRAM bottleneck when using clip vision on a large batch of images. Iterating over the images removes the bottleneck at the expense of slightly longer processing when image batch is large

@cubiq
Copy link
Owner

cubiq commented Apr 26, 2024

Thanks for your suggestions! this is something I'm experimenting too but I want to try a slightly different approach.

The expensive bit is the image encoding itself. That can be easily solved by batching the encoder, once encoded I would keep all the embeds in the regular ram, ready to be used. That keeps the code pretty simple and doesn't touch the image encoder itself (which is important because we need to keep it aligned with comfyui's updates)

Once we enter the diffusion process everything is moved to the vram, instead of doing that we still keep them into regular ram and move them to vram 16 at the time (or whatever is the context window).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants