Logs files are huge with this running on every gradient accumulation set, right now I'm just using the good old fashioned
import warnings
warnings.filterwarnings("ignore")
apex/transformer/pipeline_parallel/p2p_communication.py:228: ExperimentalWarning: The combination of async_comm
and sequence_parallel_enabled
is not well tested.
It's still in the latest, https://github.com/NVIDIA/apex/blob/master/apex/transformer/pipeline_parallel/p2p_communication.py#L235