Display each generation step #6991

ran4erep · 2024-02-16T05:49:13Z

ran4erep
Feb 16, 2024

How can I display each generation step lke in InvokeAI UI? I've found one solution, but it seems it's old, because callback and callback_steps are deprecated. Anyway that code was working, but it hanged up Google Colab. Here is the code I'm talking about:

`def callback(iter, t, latents):
# convert latents to image
with torch.no_grad():
latents = 1 / 0.18215 * latents
image = model.vae.decode(latents).sample

    image = (image / 2 + 0.5).clamp(0, 1)

    # we always cast to float32 as this does not cause significant overhead and is compatible with bfloa16
    image = image.cpu().permute(0, 2, 3, 1).float().numpy()

    # convert to PIL Images
    image = model.numpy_to_pil(image)

    # do something with the Images
    for i, img in enumerate(image):
        update_display(img)`

But I've found another code, it's don't hanging Google Colab, but the generation's noise not pretty, not what I want:

can I make something like InvokeAI?

Answered by asomoza

Feb 16, 2024

for the first method you just need to put the code I gave you together:

def latents_to_rgb(latents):
    weights = (
        (60, -60, 25, -70),
        (60,  -5, 15, -50),
        (60,  10, -5, -35)
    )

    weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device))
    biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device)
    rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1)
    image_array = rgb_tensor.clamp(0, 255)[0].byte().cpu().numpy()
    image_array = image_array.transpose(1, 2, 0)  # Change the order of dimensions

    return Image.fromarray(image_array

View full answer

asomoza · 2024-02-16T09:07:03Z

asomoza
Feb 16, 2024
Maintainer

I haven't looked at how InvokeAI does it, but is not that hard to do.

First of all, you can't just convert the latents to an image, I don't know where you got that code, but is not as easy as to permute the dimensions and expect a good image. There's a whole model trained for this but you can achieve similar results with lower quality if you skip it.

For starters, you'll need to use the callback_on_step_end, something like this:

def decode_tensors(pipe, step, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]
    
    # convert latents to an image

    return callback_kwargs

image = pipe(
    height=image_height,
    width=image_width,
    prompt=prompt,
    negative_prompt="",
    guidance_scale=7.5,
    num_inference_steps=20,
    generator=generator,
    callback_on_step_end=decode_tensors,
    callback_on_step_end_tensor_inputs=["latents"],
).images[0]

Then to convert the latents to images, what you're doing is one method but very resource intensive, I don't know what are you doing in your code since you use a model.vae instead of a pipeline or pipe, but doing the VAE decode in each step is going to be really slow and if you're loading a second VAE you'll use a lot more VRAM, so probably your colab is crashing because you use all the VRAM or too many resources.

So what you're doing is not practical but I know two other methods to achieve what your looking for:

1.- You can use the function in this blog post: https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space. The author also explains the process and what each dimension and channel means in the latents, but if you just want the function to do it is this one:

def latents_to_rgb(latents):
    weights = (
        (60, -60, 25, -70),
        (60,  -5, 15, -50),
        (60,  10, -5, -35)
    )

    weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device))
    biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device)
    rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1)
    image_array = rgb_tensor.clamp(0, 255)[0].byte().cpu().numpy()
    image_array = image_array.transpose(1, 2, 0)  # Change the order of dimensions

    return Image.fromarray(image_array)

The only con of this method is that the latent space is a compressed space of 128x128, so this images are of 128x128 as well, very small to do some decent previews in big windows, good and fast for small previews.

2.- Use taesd which is a really small and fast autoencoder. The con of this method is that you'll need to download the weights and the know-how of loading the model, but once you have that is as easy as this:

    with torch.no_grad():
        decoded = taesd(latents.float()).clamp(0, 1).mul_(255).round().byte()
    image = Image.fromarray(decoded[0].permute(1, 2, 0).cpu().numpy())

The image would be the same as the generation with just lower quality in the details, but is good enough for previews. I remember I saw somewhere they added TAESD to diffusers but I really haven't used it that way.

This would be a comparison:

RGB

step 0	step 5	step 10	step 15	step 19

TAESD

step 0	step 5	step 10	step 15	step 19

4 replies

ran4erep Feb 16, 2024
Author

thanks for your reply, but I didn't get anything... I don't know how it all works. All I know is that I can access steps generation with callback_on_step_end argument and do something after each step. But AI don't know what exactly do I need to do inside that function to display noise

asomoza Feb 16, 2024
Maintainer

for the first method you just need to put the code I gave you together:

def latents_to_rgb(latents):
    weights = (
        (60, -60, 25, -70),
        (60,  -5, 15, -50),
        (60,  10, -5, -35)
    )

    weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device))
    biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device)
    rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1)
    image_array = rgb_tensor.clamp(0, 255)[0].byte().cpu().numpy()
    image_array = image_array.transpose(1, 2, 0)  # Change the order of dimensions

    return Image.fromarray(image_array)

def decode_tensors(pipe, step, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]
    
    image = latents_to_rgb(latents)
    image.save(f"{step}.png")

    return callback_kwargs

image = pipe(
    height=image_height,
    width=image_width,
    prompt=prompt,
    negative_prompt="",
    guidance_scale=7.5,
    num_inference_steps=20,
    generator=generator,
    callback_on_step_end=decode_tensors,
    callback_on_step_end_tensor_inputs=["latents"],
).images[0]

Answer selected by ran4erep

ran4erep Feb 16, 2024
Author

thank you very much! Works like a charm. This is exactly what I was looking for

henrymcl Sep 7, 2024

for the first method you just need to put the code I gave you together:

def latents_to_rgb(latents):
    weights = (
        (60, -60, 25, -70),
        (60,  -5, 15, -50),
        (60,  10, -5, -35)
    )

    weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device))
    biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device)
    rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1)
    image_array = rgb_tensor.clamp(0, 255)[0].byte().cpu().numpy()
    image_array = image_array.transpose(1, 2, 0)  # Change the order of dimensions

    return Image.fromarray(image_array)

def decode_tensors(pipe, step, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]
    
    image = latents_to_rgb(latents)
    image.save(f"{step}.png")

    return callback_kwargs

image = pipe(
    height=image_height,
    width=image_width,
    prompt=prompt,
    negative_prompt="",
    guidance_scale=7.5,
    num_inference_steps=20,
    generator=generator,
    callback_on_step_end=decode_tensors,
    callback_on_step_end_tensor_inputs=["latents"],
).images[0]

~~When I use this for FluxPipeline the error image_array = image_array.transpose(1, 2, 0), I can't even debug it because printing the image_array variable gives the same error.~~

~~Can this information be included in the documentation?~~

I used the official code

def callback_dynamic_cfg(p, step_index, timestep, callback_kwargs):
    if step_index % 10 == 0:
        latents = callback_kwargs["latents"]
        latents = p._unpack_latents(latents, height, width, p.vae_scale_factor)
        latents = (latents / p.vae.config.scaling_factor) + p.vae.config.shift_factor
        images = p.vae.decode(latents, return_dict=False)[0]
        images = p.image_processor.postprocess(images)
        for i, image in enumerate(images):
            image.save(f"flux-dev-{timestamp}-{i}-step-{step_index}.png")
        alarm()
    return callback_kwargs

sayakpaul · 2024-02-18T14:57:33Z

sayakpaul
Feb 18, 2024
Maintainer

@asomoza thanks so much. I am going to use this myself as well.

@yiyixuxu @stevhliu what do you think about adding a guide to our docs crediting @asomoza and https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space?

0 replies

yiyixuxu · 2024-02-18T20:38:03Z

yiyixuxu
Feb 18, 2024
Maintainer

@sayakpaul @asomoza @stevhliu

yeah I would use this too!

I want to collect more of these kinds of examples from the community (about how to use our callback loops to implement cool features like this one!) - I think the rest of us can benefit greatly from a "gallery" like this

I wonder what would be the best platform for this? should we create a folder in the community folder? or maybe a delicate section on the doc?

0 replies

sayakpaul · 2024-02-19T01:19:28Z

sayakpaul
Feb 19, 2024
Maintainer

I agree that community and research_folder directories suffice here. However, this use case, IMO, is a very common one and I imagine lots of users benefiting from this.

0 replies

yiyixuxu · 2024-02-19T03:55:35Z

yiyixuxu
Feb 19, 2024
Maintainer

@sayakpaul
ohh I'm not suggesting putting this into the community folder - I definitely would love to see this in the doc

My comment is about finding a "platform" for the community to share more short code snippets like this one, using the callback argument to implement cool features.

2 replies

sayakpaul Feb 19, 2024
Maintainer

Yeah that makes sense. Maybe a cool space? Folks could just submit PRs to that Space for enlisting their cool stuff using the callback feature.

As a template for the Space WDYT https://huggingface.co/spaces/Chunte/HFBA?

yiyixuxu Feb 22, 2024
Maintainer

yeah good idea! going to look into this

stevhliu · 2024-02-22T17:59:01Z

stevhliu
Feb 22, 2024
Maintainer

Hey @asomoza, great work with all the tips and tricks you've been supplying in the discussions! We would love to collaborate more with you over Slack to add these to the docs so everyone benefits from them. Is there an email we can invite you with? 🙂

1 reply

asomoza Feb 22, 2024
Maintainer

sure, no problem but I'll PM it to you with twitter(X) to avoid being massive spammed ^^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display each generation step #6991

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Display each generation step #6991

ran4erep Feb 16, 2024

Replies: 6 comments · 7 replies

asomoza Feb 16, 2024 Maintainer

RGB

TAESD

ran4erep Feb 16, 2024 Author

asomoza Feb 16, 2024 Maintainer

ran4erep Feb 16, 2024 Author

henrymcl Sep 7, 2024

sayakpaul Feb 18, 2024 Maintainer

yiyixuxu Feb 18, 2024 Maintainer

sayakpaul Feb 19, 2024 Maintainer

yiyixuxu Feb 19, 2024 Maintainer

sayakpaul Feb 19, 2024 Maintainer

yiyixuxu Feb 22, 2024 Maintainer

stevhliu Feb 22, 2024 Maintainer

asomoza Feb 22, 2024 Maintainer

ran4erep
Feb 16, 2024

Replies: 6 comments 7 replies

asomoza
Feb 16, 2024
Maintainer

ran4erep Feb 16, 2024
Author

asomoza Feb 16, 2024
Maintainer

ran4erep Feb 16, 2024
Author

sayakpaul
Feb 18, 2024
Maintainer

yiyixuxu
Feb 18, 2024
Maintainer

sayakpaul
Feb 19, 2024
Maintainer

yiyixuxu
Feb 19, 2024
Maintainer

sayakpaul Feb 19, 2024
Maintainer

yiyixuxu Feb 22, 2024
Maintainer

stevhliu
Feb 22, 2024
Maintainer

asomoza Feb 22, 2024
Maintainer