bug in flash attn backward with context parallel #1414

sallyjunjun · 2025-01-17T07:42:24Z

when I test llama2 with context parallel, I met the following error:

File "/code/Megatron/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 812, in _inner_fwd_bwd_function_with_profiling
_ret = fwd_bwd_function(
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 1249, in forward_backward_pipelining_with_interleaving
input_tensor_grad = backward_step_helper(backward_k)
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 951, in backward_step_helper
input_tensor_grad = backward_step(
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 366, in backward_step
custom_backward(output_tensor[0], output_tensor_grad[0])
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 150, in custom_backward
Variable._execution_engine.run_backward(
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/code/Megatron/Megatron-LM/megatron/core/tensor_parallel/random.py", line 306, in backward
torch.autograd.backward(outputs, args)
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/init.py", line 347, in backward
_engine_run_backward(
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py", line 2990, in backward
flash_attn_bwd(
TypeError: _flash_attn_varlen_backward() missing 1 required positional argument: 'softcap'

I am using flash-attn 2.6.3. Is this problem related with flash-attn version or some other reason?

sallyjunjun · 2025-01-17T09:14:13Z

this problem is solved when rollback flash-attn version from 2.6.3 to 2.3.0

sallyjunjun closed this as completed Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in flash attn backward with context parallel #1414

bug in flash attn backward with context parallel #1414

sallyjunjun commented Jan 17, 2025

sallyjunjun commented Jan 17, 2025

bug in flash attn backward with context parallel #1414

bug in flash attn backward with context parallel #1414

Comments

sallyjunjun commented Jan 17, 2025

sallyjunjun commented Jan 17, 2025