-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Medusa Training Loss #95
Comments
I am also facing the same issue with Mistral example listed in the repo. |
same issue |
Have you solved this problem? |
Unfortunately no |
I find some problems with the data,you can check it |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When utilizing Axolotl, the training loss reduces to 0 following the gradient accumulation steps. Is this expected behaviour?
With Torchrun, the training loss consistently remains NaN.
Thanks for the help!! Here is the training configuration:
base_model: teknium/OpenHermes-2.5-Mistral-7B
base_model_config: teknium/OpenHermes-2.5-Mistral-7B
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: false
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
type: sharegpt
dataset_prepared_path:
val_set_size: 0.1
output_dir: ./openhermes7B_medusa_stage1
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0005
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
use_reentrant: True
warmup_steps: 40
eval_steps: 0.01
evaluation_strategy: steps
save_strategy: steps
save_steps:
save_total_limit: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "
"eos_token: "<|im_end|>"
unk_token: ""
medusa_num_heads: 5
medusa_num_layers: 1
medusa_heads_coefficient: 0.2
medusa_decay_coefficient: 0.8
medusa_logging: true
medusa_scheduler: constant
medusa_lr_multiplier: 4.0
medusa_only_heads: true
ddp_find_unused_parameters: true
The text was updated successfully, but these errors were encountered: