Hi Xtuner Team,
I noticed in the official documentation that one of the key highlights of Xtuner V1 is its ability to train 200B+ models using FSDP instead of EP.
Breakthrough Performance Bottleneck: First time achieving FSDP training throughput surpassing traditional 3D parallel solutions on MoE models above 200B scale (https://xtuner.readthedocs.io/en/latest/#core-features)
I am currently looking to train the 200B+ MoE Model such as Qwen3-235B-A22B model and would love to leverage this capability. Could you please provide a reference example or a configuration template for a model of this scale?
Thank you for the great work!