[NPU] npu attention enable ulysses #12919

TmacAaron · 2026-01-07T07:43:26Z

What does this PR do?

The original npu attention backend in diffusers does not support ulysses parallel yet. This PR is to implement the ulysses parallel attention for npu attention backend.

Note: Only implement forward op now, the backward op is not supported now.

Test

Hardware

Atlas 800T A2

Repro Code

import os
import time

import torch
import torch_npu
from torch_npu.contrib import transfer_to_npu

from diffusers import FluxPipeline, ContextParallelConfig


def launched_with_torchrun() -> bool:
    return (
        "RANK" in os.environ
        and "WORLD_SIZE" in os.environ
        and "LOCAL_RANK" in os.environ
    )

warm_up = True

model_id = "black-forest-labs/FLUX.1-dev"
height = 1024
width = 1024
steps = 50
prompt = "A cat holding a sign that says hello world"


try:
    if launched_with_torchrun():
        torch.distributed.init_process_group("nccl")
        rank = torch.distributed.get_rank()
        device = torch.device("cuda", rank % torch.cuda.device_count())
        world_size = torch.distributed.get_world_size()
    else:
        rank = 0
        device = torch.device("cuda")
        world_size = 1
    torch.cuda.set_device(device)

    pipe = FluxPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16).to(device)
    pipe.transformer.set_attention_backend("_native_npu")

    if launched_with_torchrun():
        print(f"{world_size=}")
        pipe.transformer.enable_parallelism(config=ContextParallelConfig(ulysses_degree=world_size))
    
    # Warm Up
    if warm_up:
        _ = pipe(
            prompt,
            height=1024,
            width=1024,
            guidance_scale=3.5,
            num_inference_steps=2,
            max_sequence_length=512,
            generator=torch.Generator("cpu").manual_seed(0)
        ).images[0]

    # Inference
    start_time = time.time()
    image = pipe(
        prompt,
        height=height,
        width=width,
        guidance_scale=3.5,
        num_inference_steps=steps,
        max_sequence_length=512,
        generator=torch.Generator("cpu").manual_seed(0)
    ).images[0]
    end_time = time.time()

    if rank == 0:
        print(f"Inference Time: {end_time - start_time} s")
        image.save(f"sp{world_size}-flux-dev.png")

except Exception as e:
    print(f"An error occurred: {e}")
    raise e

finally:
    if launched_with_torchrun():
        if torch.distributed.is_initialized():
            torch.distributed.destroy_process_group()

Result

1. no ulysses attention

Command

python ./flux_infer.py

Inference Time:
24.44s

Result Image:

2. ulysses attention with degree 2

Command:

torchrun --nproc_per_node=2 ./flux_infer.py

Inference Time:
18.83s

Result Image:

3. ulysses attention with degree 4

Command:

torchrun --nproc_per_node=4 ./flux_infer.py

Inference Time:
12.69s

Result Image:

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

TmacAaron · 2026-01-07T07:50:20Z

@yiyixuxu @sayakpaul Hello, could you please review this pr, tks.

Signed-off-by: yyt <[email protected]>

…dant transpose Signed-off-by: yyt <[email protected]>

sayakpaul · 2026-01-08T09:59:34Z

@bot /style

github-actions · 2026-01-08T10:00:01Z

Style fix is beginning .... View the workflow run here.

HuggingFaceDocBuilderDev · 2026-01-08T10:05:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

TmacAaron · 2026-01-09T01:30:01Z

@bot /style

can you take a look again?

sayakpaul · 2026-01-09T04:41:58Z

Thanks for your contribution!

TmacAaron added 2 commits November 1, 2025 08:57

npu attention enable ulysses

17e2a42

clean the format

a033e7f

TmacAaron added 3 commits January 8, 2026 15:53

Merge branch 'main' into npu_ulysses

79c1107

register _native_npu_attention to _supports_context_parallel

002e7ef

Signed-off-by: yyt <[email protected]>

change npu_fusion_attention's input_layout to BSND to eliminate redun…

9a5e827

…dant transpose Signed-off-by: yyt <[email protected]>

sayakpaul approved these changes Jan 8, 2026

View reviewed changes

Merge branch 'main' into npu_ulysses

51ba43c

Update format

8780c4a

Merge branch 'main' into npu_ulysses

10dec67

sayakpaul merged commit be38f41 into huggingface:main Jan 9, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NPU] npu attention enable ulysses #12919

[NPU] npu attention enable ulysses #12919

TmacAaron commented Jan 7, 2026 •

edited

Loading

Uh oh!

TmacAaron commented Jan 7, 2026 •

edited

Loading

Uh oh!

sayakpaul commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 8, 2026

Uh oh!

TmacAaron commented Jan 9, 2026

Uh oh!

Uh oh!

sayakpaul commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[NPU] npu attention enable ulysses #12919

[NPU] npu attention enable ulysses #12919

Conversation

TmacAaron commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test

Hardware

Repro Code

Result

1. no ulysses attention

2. ulysses attention with degree 2

3. ulysses attention with degree 4

Before submitting

Who can review?

Uh oh!

TmacAaron commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 8, 2026

Uh oh!

TmacAaron commented Jan 9, 2026

Uh oh!

Uh oh!

sayakpaul commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TmacAaron commented Jan 7, 2026 •

edited

Loading

TmacAaron commented Jan 7, 2026 •

edited

Loading