-
Notifications
You must be signed in to change notification settings - Fork 444
Open
Description
Hi fairchem team!
I'm on a research of finetuning uma-s-1p1 with all heads pretrained. I see the PR #1766 and delete the head part in my finetune config.
uma_sm_finetune_template.yaml
from:
model:
_target_: fairchem.core.units.mlip_unit.mlip_unit.initialize_finetuning_model
checkpoint_location:
_target_: fairchem.core.calculate.pretrained_mlip.pretrained_checkpoint_path_from_name
model_name: ${base_model_name}
overrides:
backbone:
otf_graph: true
max_neighbors: ${max_neighbors}
regress_stress: ${data.regress_stress}
always_use_pbc: false
pass_through_head_outputs: ${data.pass_through_head_outputs}
heads: ${data.heads}
to:
model:
_target_: fairchem.core.units.mlip_unit.mlip_unit.initialize_finetuning_model
checkpoint_location:
_target_: fairchem.core.calculate.pretrained_mlip.pretrained_checkpoint_path_from_name
model_name: ${base_model_name}
overrides:
backbone:
otf_graph: true
max_neighbors: ${max_neighbors}
regress_stress: ${data.regress_stress}
always_use_pbc: false
pass_through_head_outputs: ${data.pass_through_head_outputs}
I do some basic debugs to make the programm run but the loss at step 0 is still abnormally high, which is:
INFO:root:{'train/loss': 8990.620638182878, 'train/lr': 1e-05, 'train/step': 0, 'train/epoch': 0.0, 'train/samples_per_second(approx)': 6.1664387292354395, 'train/atoms_per_second(approx)': 197.7114417561113, 'train/num_atoms_on_rank': 1026, 'train/num_samples_on_rank': 32}
/data/sunxuetin/anaconda3/envs/UMA/lib/python3.12/site-packages/torch/optim/lr_scheduler.py:332: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
_warn_get_lr_called_within_step(self)
INFO:root:{'train/loss': 11867.852604975855, 'train/lr': 1e-05, 'train/step': 0, 'train/epoch': 0.0, 'train/samples_per_second(approx)': 6.168348108451524, 'train/atoms_per_second(approx)': 197.5799003488379, 'train/num_atoms_on_rank': 1025, 'train/num_samples_on_rank': 32}
/data/sunxuetin/anaconda3/envs/UMA/lib/python3.12/site-packages/torch/optim/lr_scheduler.py:332: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
_warn_get_lr_called_within_step(self)
INFO:root:{'train/loss': 10284.808734129449, 'train/lr': 1e-05, 'train/step': 0, 'train/epoch': 0.0, 'train/samples_per_second(approx)': 6.165504733162675, 'train/atoms_per_second(approx)': 197.68149550702827, 'train/num_atoms_on_rank': 1026, 'train/num_samples_on_rank': 32}
/data/sunxuetin/anaconda3/envs/UMA/lib/python3.12/site-packages/torch/optim/lr_scheduler.py:332: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
_warn_get_lr_called_within_step(self)
INFO:root:{'train/loss': 10899.674122548213, 'train/lr': 1e-05, 'train/step': 0, 'train/epoch': 0.0, 'train/samples_per_second(approx)': 6.131760968243467, 'train/atoms_per_second(approx)': 195.83311592327573, 'train/num_atoms_on_rank': 1022, 'train/num_samples_on_rank': 32}
which is same as the loss using re-initialized heads.
Do I need to do anything else when editing the config to make sure the heads are succefully loaded?
Thanks for your reply!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels