Skip to content

[Question] Any examples for 200B+ MoE training with FSDP? #1437

@cailun01

Description

@cailun01

Hi Xtuner Team,

I noticed in the official documentation that one of the key highlights of Xtuner V1 is its ability to train 200B+ models using FSDP instead of EP.

Breakthrough Performance Bottleneck: First time achieving FSDP training throughput surpassing traditional 3D parallel solutions on MoE models above 200B scale (https://xtuner.readthedocs.io/en/latest/#core-features)

I am currently looking to train the 200B+ MoE Model such as Qwen3-235B-A22B model and would love to leverage this capability. Could you please provide a reference example or a configuration template for a model of this scale?

Thank you for the great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions