This project aims to evaluating two state-of-the-art machine learning models, SchNet and sGDML, for their ability to predict molecular forces and potential energy surfaces. It aims to replicate and extend the findings from the study by Vassilev-Galindo, Valentin, et al., titled "Challenges for machine learning force fields in reproducing potential energy surfaces of flexible molecules" (The Journal of Chemical Physics, 2021). The focus is on using the same dataset and preparing the models for using them to make predictions on various others organic molecules. The report on the project work explores the implementation challenges, computational requirements, and model performance, highlighting key observations and insights related to molecular dynamics predictions.
To get started with SchNet, first install SchNetPack. You can install it by running the following commands:
pip install schnetpackMake sure to install all the dependencies required for SchNet. You can refer to the official SchNetPack documentation for detailed installation instructions.
You can download the dataset from the following link: Dataset Link. I have provided 1 file for reference here.
-
Convert XYZ Dataset Files to NPZ: Before you start training, you need to convert the XYZ files to NPZ files. This can be done by running the script
xyz_npz.py.python xyz_npz.py
-
Create the Database File: After converting the dataset, create a
.dbfile, which will act as the database file for feeding into the model. This can be done using the scriptcreate_db.py.python create_db.py
-
Training the Model: After preparing the data, you can now train the model.
-
If you have a GPU, run the following command to train the model for predicting forces:
python main_gpu.py
-
If you don't have a GPU, you can run the following command:
python main.py
-
If you want to train a model for predicting energies, use this command:
python main_energy.py
-
-
Tuning Hyperparameters: You can tune various hyperparameters like the number of basis atoms, number of interactions, cutoff radius, etc., to see how different configurations affect the model performance.
-
Model Saved: Once the training is complete, the model will be saved in a
.pthfile. -
Making Predictions: After training, you can use the saved model to make predictions. Run the following command to get the Mean Squared Error (MSE) and Mean Absolute Error (MAE) using
test.py:python test.py
To get started with sGDML, make sure sGDML is installed with all the required dependencies. You can find the installation instructions on the official sGDML GitHub page.
-
Load Dataset: First, you need to load the dataset from an XYZ file into an NPZ file. This can be done using the provided script
xyz_to_npz.py.python xyz_to_npz.py
-
Training the Model: Once you have the dataset in NPZ format, you can start training the model. Run the following command to start training:
python train.py
This will create a
.pthfile containing the trained model. -
Prediction: To use the trained model for prediction, run the following command to create an output file
output.txtwith the predictions:python predict.py
-
Test Model: Alternatively, you can use
test.pyto directly compute the MSE and MAE for the trained model:python test.py
Make sure that the paths for the input files and the model are correctly set in the Python scripts before running them.