Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in flash attn backward with context parallel #1414

Closed
sallyjunjun opened this issue Jan 17, 2025 · 1 comment
Closed

bug in flash attn backward with context parallel #1414

sallyjunjun opened this issue Jan 17, 2025 · 1 comment

Comments

@sallyjunjun
Copy link

when I test llama2 with context parallel, I met the following error:

File "/code/Megatron/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 812, in _inner_fwd_bwd_function_with_profiling
_ret = fwd_bwd_function(
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 1249, in forward_backward_pipelining_with_interleaving
input_tensor_grad = backward_step_helper(backward_k)
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 951, in backward_step_helper
input_tensor_grad = backward_step(
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 366, in backward_step
custom_backward(output_tensor[0], output_tensor_grad[0])
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 150, in custom_backward
Variable._execution_engine.run_backward(
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/code/Megatron/Megatron-LM/megatron/core/tensor_parallel/random.py", line 306, in backward
torch.autograd.backward(outputs, args)
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/init.py", line 347, in backward
_engine_run_backward(
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py", line 2990, in backward
flash_attn_bwd(
TypeError: _flash_attn_varlen_backward() missing 1 required positional argument: 'softcap'

I am using flash-attn 2.6.3. Is this problem related with flash-attn version or some other reason?

@sallyjunjun
Copy link
Author

this problem is solved when rollback flash-attn version from 2.6.3 to 2.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant