[Question] Any examples for 200B+ MoE training with FSDP?

Hi Xtuner Team,

I noticed in the official documentation that one of the key highlights of Xtuner V1 is its ability to train 200B+ models using FSDP instead of EP.

> Breakthrough Performance Bottleneck: First time achieving FSDP training throughput surpassing traditional 3D parallel solutions on MoE models above 200B scale (https://xtuner.readthedocs.io/en/latest/#core-features)

I am currently looking to train the 200B+ MoE Model such as Qwen3-235B-A22B model and would love to leverage this capability. Could you please provide a reference example or a configuration template for a model of this scale?

Thank you for the great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Any examples for 200B+ MoE training with FSDP? #1437

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Any examples for 200B+ MoE training with FSDP? #1437

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions