Seek help for fine tune esen_30m_oam model with CP2K dataset

Dear Fairchem developer team, 

Thanks so much for developing these useful models. I have a couple of questions about fine-tuning esen_30m_oam model, I will appreciate a lot if your can give me some suggestions and feedback. 

Fairchem version: 1.10.0
DFT dataset: MOF structures calculated from CP2K PBE with D3(BJ) theory. There are around 1800 snapshots. 
In preparation of dataset for fine tune, I converted the units of `energy, forces, stress into eV, eV/A, eV/A3`. Then use the `create_finetune_dataset.py` to prepare the `aselmdb format of train and val dataset`. 
I run the reference and normalizer codes to get the mae and rmse of energy, force and stress and then entered these values into the config.yml file.
```
python -m fairchem.core.scripts.fit_references --config config.yml --out-path .
python -m fairchem.core.scripts.fit_normalizers --config config.yml --out-path . --linref-path energy_linref.pt
```
The config.yml file show like below:
```
amp: false
checkpoint: ./esen_30m_oam.pt
cmd:
  seed: 42
dataset:
  train:
    a2g_args:
      r_energy: true
      r_forces: true
      r_stress: true
    format: ase_db
    key_mapping:
      force: forces
      y: energy
    src: finetune_esen/train/
    transforms:
      element_references:
        energy:
          file: energy_linref.pt
        fit:
          batch_size: 64
          targets:
          - energy
      normalizer:
        fit:
          batch_size: 64
          targets:
            energy:
              mean: -0.0334
              rmsd: 4.6779
            forces:
              mean: 9.9184e-06
              rmsd: 0.0792
            stress:
              mean: 5.7518e-05
              rmsd: 0.0015
  val:
    a2g_args:
      r_energy: true
      r_forces: true
      r_stress: true
    format: ase_db
    src: finetune_esen/val/
evaluation_metrics:
  metrics:
    energy:
    - mae
    - per_atom_mae
    forces:
    - mae
    - cosine_similarity
    stress:
    - mae
  primary_metric: forces_mae
gp_gpus: null
gpus: 1
logger:
  entity: xx-university-of-toronto
  group: ft-cp2k_smalldataset_efs
  name: wandb
  project: esen_finetune
loss_functions:
- energy:
    coefficient: 20
    fn: per_atom_mae
- forces:
    coefficient: 20
    fn: l2mae
- stress:
    coefficient: 5
    fn: mae
model:
  backbone:
    act_type: gate
    cutoff: 6.0
    direct_forces: false
    distance_function: gaussian
    edge_channels: 128
    hidden_channels: 128
    lmax: 3
    max_neighbors: 300
    max_num_elements: 100
    mlp_type: spectral
    mmax: 2
    model: esen_backbone
    norm_type: rms_norm_sh
    num_distance_basis: 64
    num_layers: 10
    otf_graph: true
    regress_forces: true
    regress_stress: true
    sphere_channels: 128
    use_envelope: true
    use_pbc: true
    use_pbc_single: true
  heads:
    mptrj:
      module: esen_mlp_efs_head
  name: hydra
  otf_graph: true
  pass_through_head_outputs: true
optim:
  batch_size: 1
  clip_grad_norm: 100
  ema_decay: 0.999
  eval_batch_size: 1
 accumulate_grad_batches:16
  eval_every: 500
  lr_initial: 0.0004
  max_epochs: 50
  num_workers: 16
  optimizer: AdamW
  optimizer_params:
    weight_decay: 0.001
  scheduler: LambdaLR
  scheduler_params:
    epochs: 50
    lambda_type: cosine
    lr: 0.0004
    lr_min_factor: 0.1
    warmup_epochs: 2
    warmup_factor: 0.2
outputs:
  energy:
    level: system
    property: energy
  forces:
    eval_on_free_atoms: true
    level: atom
    property: forces
    train_on_free_atoms: true
  stress:
    level: system
    property: stress
relax_dataset: {}
task:
  regress_stress: true
test_dataset: {}
trainer: mlip_trainer
```

summary of metrics after finish 50 epoch:
num_params:30,161,153
train/energy_mae:62.6619140625
train/energy_per_atom_mae:0.41674168035387993
train/epoch:50
train/forces_cosine_similarity:0.17787708616217723
train/forces_mae:0.06906108922482707
train/grad_norm:8,160.07373046875
train/loss:18.42776176929474
train/lr:0.00004000000016177294
train/step:74,100
train/stress_mae:0.016690184600237343
val/energy_mae:11.361074746621622
val/energy_per_atom_mae:0.07505887113241172
val/epoch:49.932523616734144
val/forces_cosine_similarity:0.22308901235162257
val/forces_mae:0.035628032157099025
val/loss:4.876973407977336
val/stress_mae:0.008955139347719862

With above model performance, when I come to use the best_checkpoint.pt to do some structure optimizations of the same structures that have been used as fine tune dataset. I can see all structures explode. In this way, we can say the model is not fine tuned well or enough. 

**Question1:** I want to get some suggestions/help about how to improve the performance of the model quickly as we can see it already have 50 epochs and 100K steps, but the mae for energy per atom is still around 0.1 ev/atom. And the application study of structural optimization to the same system is pretty bad. The fmax in the structure optimization is getting bigger and bigger with optimization steps. And the structures become explode.

<img width="1628" height="655" alt="Image" src="https://github.com/user-attachments/assets/1146462f-de9d-4900-ac1a-23c71841aac6" />

<img width="366" height="326" alt="Image" src="https://github.com/user-attachments/assets/61a4125e-3f10-4a24-9c5c-634f106d54a0" />

**Question 2:** I want to know did I do the right thing for fine tuning from CP2K dataset for esen_30m_oam model. By looking at the code and the uma paper, it seems after reference and normalizer the energy, force and stress to the references one of esen_30m_oam, different theory dataset would be okay for fine tuning esen_30m_oma model. Am I right?

I appreciate a lot with any feedback and help. Thanks in advance. 

Best, 
Ju

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seek help for fine tune esen_30m_oam model with CP2K dataset #1840

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seek help for fine tune esen_30m_oam model with CP2K dataset #1840

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions