You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when I test llama2 with context parallel, I met the following error:
File "/code/Megatron/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 812, in _inner_fwd_bwd_function_with_profiling
_ret = fwd_bwd_function(
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 1249, in forward_backward_pipelining_with_interleaving
input_tensor_grad = backward_step_helper(backward_k)
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 951, in backward_step_helper
input_tensor_grad = backward_step(
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 366, in backward_step
custom_backward(output_tensor[0], output_tensor_grad[0])
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 150, in custom_backward
Variable._execution_engine.run_backward(
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/code/Megatron/Megatron-LM/megatron/core/tensor_parallel/random.py", line 306, in backward
torch.autograd.backward(outputs, args)
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/init.py", line 347, in backward
_engine_run_backward(
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py", line 2990, in backward
flash_attn_bwd(
TypeError: _flash_attn_varlen_backward() missing 1 required positional argument: 'softcap'
I am using flash-attn 2.6.3. Is this problem related with flash-attn version or some other reason?
The text was updated successfully, but these errors were encountered:
when I test llama2 with context parallel, I met the following error:
File "/code/Megatron/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 812, in _inner_fwd_bwd_function_with_profiling
_ret = fwd_bwd_function(
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 1249, in forward_backward_pipelining_with_interleaving
input_tensor_grad = backward_step_helper(backward_k)
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 951, in backward_step_helper
input_tensor_grad = backward_step(
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 366, in backward_step
custom_backward(output_tensor[0], output_tensor_grad[0])
File "/code/Megatron/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 150, in custom_backward
Variable._execution_engine.run_backward(
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/code/Megatron/Megatron-LM/megatron/core/tensor_parallel/random.py", line 306, in backward
torch.autograd.backward(outputs, args)
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/init.py", line 347, in backward
_engine_run_backward(
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
File "/miniconda3-new/envs/llm-cuda12.4-nemo/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py", line 2990, in backward
flash_attn_bwd(
TypeError: _flash_attn_varlen_backward() missing 1 required positional argument: 'softcap'
I am using flash-attn 2.6.3. Is this problem related with flash-attn version or some other reason?
The text was updated successfully, but these errors were encountered: