-
Notifications
You must be signed in to change notification settings - Fork 447
Description
Dear Fairchem developer team,
Thanks so much for developing these useful models. I have a couple of questions about fine-tuning esen_30m_oam model, I will appreciate a lot if your can give me some suggestions and feedback.
Fairchem version: 1.10.0
DFT dataset: MOF structures calculated from CP2K PBE with D3(BJ) theory. There are around 1800 snapshots.
In preparation of dataset for fine tune, I converted the units of energy, forces, stress into eV, eV/A, eV/A3. Then use the create_finetune_dataset.py to prepare the aselmdb format of train and val dataset.
I run the reference and normalizer codes to get the mae and rmse of energy, force and stress and then entered these values into the config.yml file.
python -m fairchem.core.scripts.fit_references --config config.yml --out-path .
python -m fairchem.core.scripts.fit_normalizers --config config.yml --out-path . --linref-path energy_linref.pt
The config.yml file show like below:
amp: false
checkpoint: ./esen_30m_oam.pt
cmd:
seed: 42
dataset:
train:
a2g_args:
r_energy: true
r_forces: true
r_stress: true
format: ase_db
key_mapping:
force: forces
y: energy
src: finetune_esen/train/
transforms:
element_references:
energy:
file: energy_linref.pt
fit:
batch_size: 64
targets:
- energy
normalizer:
fit:
batch_size: 64
targets:
energy:
mean: -0.0334
rmsd: 4.6779
forces:
mean: 9.9184e-06
rmsd: 0.0792
stress:
mean: 5.7518e-05
rmsd: 0.0015
val:
a2g_args:
r_energy: true
r_forces: true
r_stress: true
format: ase_db
src: finetune_esen/val/
evaluation_metrics:
metrics:
energy:
- mae
- per_atom_mae
forces:
- mae
- cosine_similarity
stress:
- mae
primary_metric: forces_mae
gp_gpus: null
gpus: 1
logger:
entity: xx-university-of-toronto
group: ft-cp2k_smalldataset_efs
name: wandb
project: esen_finetune
loss_functions:
- energy:
coefficient: 20
fn: per_atom_mae
- forces:
coefficient: 20
fn: l2mae
- stress:
coefficient: 5
fn: mae
model:
backbone:
act_type: gate
cutoff: 6.0
direct_forces: false
distance_function: gaussian
edge_channels: 128
hidden_channels: 128
lmax: 3
max_neighbors: 300
max_num_elements: 100
mlp_type: spectral
mmax: 2
model: esen_backbone
norm_type: rms_norm_sh
num_distance_basis: 64
num_layers: 10
otf_graph: true
regress_forces: true
regress_stress: true
sphere_channels: 128
use_envelope: true
use_pbc: true
use_pbc_single: true
heads:
mptrj:
module: esen_mlp_efs_head
name: hydra
otf_graph: true
pass_through_head_outputs: true
optim:
batch_size: 1
clip_grad_norm: 100
ema_decay: 0.999
eval_batch_size: 1
accumulate_grad_batches:16
eval_every: 500
lr_initial: 0.0004
max_epochs: 50
num_workers: 16
optimizer: AdamW
optimizer_params:
weight_decay: 0.001
scheduler: LambdaLR
scheduler_params:
epochs: 50
lambda_type: cosine
lr: 0.0004
lr_min_factor: 0.1
warmup_epochs: 2
warmup_factor: 0.2
outputs:
energy:
level: system
property: energy
forces:
eval_on_free_atoms: true
level: atom
property: forces
train_on_free_atoms: true
stress:
level: system
property: stress
relax_dataset: {}
task:
regress_stress: true
test_dataset: {}
trainer: mlip_trainer
summary of metrics after finish 50 epoch:
num_params:30,161,153
train/energy_mae:62.6619140625
train/energy_per_atom_mae:0.41674168035387993
train/epoch:50
train/forces_cosine_similarity:0.17787708616217723
train/forces_mae:0.06906108922482707
train/grad_norm:8,160.07373046875
train/loss:18.42776176929474
train/lr:0.00004000000016177294
train/step:74,100
train/stress_mae:0.016690184600237343
val/energy_mae:11.361074746621622
val/energy_per_atom_mae:0.07505887113241172
val/epoch:49.932523616734144
val/forces_cosine_similarity:0.22308901235162257
val/forces_mae:0.035628032157099025
val/loss:4.876973407977336
val/stress_mae:0.008955139347719862
With above model performance, when I come to use the best_checkpoint.pt to do some structure optimizations of the same structures that have been used as fine tune dataset. I can see all structures explode. In this way, we can say the model is not fine tuned well or enough.
Question1: I want to get some suggestions/help about how to improve the performance of the model quickly as we can see it already have 50 epochs and 100K steps, but the mae for energy per atom is still around 0.1 ev/atom. And the application study of structural optimization to the same system is pretty bad. The fmax in the structure optimization is getting bigger and bigger with optimization steps. And the structures become explode.
Question 2: I want to know did I do the right thing for fine tuning from CP2K dataset for esen_30m_oam model. By looking at the code and the uma paper, it seems after reference and normalizer the energy, force and stress to the references one of esen_30m_oam, different theory dataset would be okay for fine tuning esen_30m_oma model. Am I right?
I appreciate a lot with any feedback and help. Thanks in advance.
Best,
Ju