Skip to content

Request: DFlash draft checkpoint for Qwen3.5-2B/0.8B #100

@AloneGu

Description

@AloneGu

Currently, the only available DFlash draft model is Qwen3.5-4B-DFlash, which is paired with the Qwen3.5-4B target model. However, when deploying on consumer-grade GPUs (e.g., 2× 16GB), the Mamba cache required by the DFlash draft model consumes significant VRAM, making it difficult to use the recommended block_size=16 without running into OOM errors.A smaller DFlash draft checkpoint (e.g., trained from Qwen3.5-2B or Qwen3.5-0.8B) would be highly beneficial for memory-constrained deployments while still enabling speculative decoding acceleration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions