Skip to content

IsaH57/contrastive-skip-layer-guidance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contrastive Skip-Layer Guidance (CSLG)

This repository contains the official implementation of our paper:

Contrastive Skip-Layer Guidance for Controlling Semantic Coherence in Diffusion Models
Isabell Hans, Nikolai Röhrich — LMU Munich


Overview

Current diffusion models, including Stable Diffusion 3 and FLUX, often struggle with semantically complex prompts, like rendering visible, legible text, due to a trade-off between prompt adherence and image fidelity.

Contrastive Skip-Layer Guidance (CSLG) is a training-free, prompt-agnostic method for enhancing semantic coherence in such scenarios. It does so by:

  1. Automatically identifying task-relevant layers using contrastive prompt pairs,
  2. Selectively skipping these layers during inference,
  3. Combining the result with standard Classifier-Free Guidance (CFG) for improved output quality at lower guidance scales.


Setup

Requirements

Install dependencies:

pip install -r requirements.txt

Usage

This section provides a step-by-step guide to using CSLG with diffusion models like FLUX or Stable Diffusion 3.

1. Run diffusion model on prompt pairs

Use pairs of prompts that differ only with respect to a specific semantic feature (e.g. presence of visible text as seen in prompt_datasets/prompts_text_notext.json) to generate images. Hook into the model's forward pass to get the intermediate activations per layer and save them as tensors.

# FLUX
python FLUX_LayerExperiment_pairwise.py

# Stable Diffusion 3
python SD3_LayerExperiment_pairwise.py

2. Identify Task-Relevant Layers

Run the contrastive analysis to compute cosine similarity differences between the activations of the two prompts for each layer. This will help identify which layers are most relevant for the task at hand.

python visualize.py # set MODEL to 'FLUX' or 'SD3' and specify the path to the saved activations

This creates a plot showing the relative cosine similarity for each layer, helping you identify which layers to skip during inference.

3. Apply CSLG during Inference

Using the identified layers, you can now apply CSLG during inference. To compare the results, we create images without any guidance, with standard Classifier-Free Guidance (CFG), and with CSLG skipping specified layers in all possible combinations.

#FLUX
python FLUX_final_experiments.py 

#Stable Diffusion 3
python SD3_final_experiments.py # specify the layers to skip in the script

OCR-based Evaluation

To evaluate the effectiveness of CSLG in generating visible, legible text, we use an OCR-based approach. This involves running an OCR model on the generated images and comparing the results with the expected text extracted from the given prompts.

python eval.py 
# set MODEL to 'FLUX' or 'SD3' and specify the path to the generated images, as well as the IMAGE_TYPES to be considered

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages