Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
custom_data_example	custom_data_example
dataset_descriptions	dataset_descriptions
images	images
tokenizer	tokenizer
accelerate_config.yaml	accelerate_config.yaml
evaluate.py	evaluate.py
main.py	main.py
readme.md	readme.md
string_template.json	string_template.json
submit_admet.py	submit_admet.py
submit_model_to_hf.py	submit_model_to_hf.py
utils.py	utils.py

Molecular Property Prediction

This section of the repository contains all necessary scripts and details for fine-tuning ChemFM for Molecular Property Prediction tasks.

We provide Hugging Face demos where you can run property prediction on the benchmark datasets.

Overview

Molecular property prediction is one of the key tasks in computational chemistry, where the goal is to predict various properties (e.g., solubility, bioactivity, toxicity) of a molecule based on its SMILES representation.

We include comparisons for the benchmark datasets (MoleculeNet and ADMET), along with details to replicate the results reported in the paper. Model checkpoints and configurations are also provided for each dataset.
We also provide code for fine-tuning on custom datasets, allowing flexibility for various tasks.

Steps to Fine-tune ChemFM

1. Prepare the Dataset

1.1 MoleculeNet

For MoleculeNet datasets, we use ChemBench, which extracts the exact same datasets (including splitting methods, random seed, and number of folds) as described in the MoleculeNet paper.

To install ChemBench, use the following commands:

git clone https://github.com/shenwanxiang/ChemBench.git
cd ChemBench
pip install -e .

1.2 ADMET

For the ADMET benchmark, we use the TDC (Therapeutics Data Commons) library, which is included in the environment.yml file provided in the main page. Alternatively, you can install it using the following command:

conda install -c conda-forge pytdc

1.3 CustomDataset

To fine-tune ChemFM on a custom dataset, you need to prepare your dataset in a folder with three CSV files: train.csv, val.csv, and test.csv. Each CSV file should include:

A "smiles" column for the molecular SMILES strings.
One or more label columns for the target property values. For reference, example files are provided in the custom_data_example folder.

You can refer to the Supported Features section to learn more about the types of tasks we support.

(back to top)

2. Configure the Parameters for Training

You can configure the parameters for training in two ways:

Feed arguments directly to the Python file: Pass the arguments as command-line parameters when running the training script.
Specify the parameters in a YAML file: Define all configurations in a .yml file and pass the file path to the Python script.

We provide an example YAML file along with explanations of the configuration options.

For the MoleculeNet and ADMET benchmark datasets, you can directly use the configuration files stored in configs/admet and configs/moleculenet.

(back to top)

3. Fine-tuning Script

To fine-tune ChemFM, you can use the following command:

python -m accelerate.commands.launch --config_file accelerate_config.yaml main.py --training_args_file <config_yml_file>

Our code is based on the accelerate package, and the accelerate_config.yaml file is used to configure the distribution settings for training across multiple devices.

(back to top)

Benchmark Results

The evaluation of molecular property prediction models can be quite noisy in the community. We have observed several papers and methods that use certain "tricks" to improve their evaluation performance. Our evaluations, however, are based on the following strict standards, which are common in machine learning evaluations but often ignored by some methods:

Exact dataset usage: We use the exact same datasets (including the splitting methods, random seeds, and number of folds). This ensures that the training, validation, and test datasets are consistent across evaluations. Different data splits can lead to significant differences in evaluation results, especially when using the scaffold splitting.
Training only on the training set: We strictly use only the training data for training the model. Including validation data in the training set can considerably boost performance, especially for datasets with fewer than 1000 data samples.
Hyperparameter tuning on the validation set: We tune hyperparameters only based on the validation dataset, not the test dataset. Public test datasets are often leaked, and we’ve observed cases where hyperparameters are optimized directly on test performance, which are not correctly evaluated. For example, when working with the CYP2C9_Substrate_CarbonMangels dataset in the ADMET benchmark, we select hyperparameters and report the performance based on the validation metric (the dot in the green circle) rather than using the test metric (the dot in the red circle).

(back to top)

MoleculeNet Results

As previously mentioned, different data splits can lead to significant variations in evaluation results, particularly when using scaffold splitting.

This comparison is based on the exact same data splits from the original MoleculeNet benchmark, as extracted by ChemBench.

We also compared our results with other methods using different data splits on the MoleculeNet benchmark in our paper.

Important

We exclude methods that are not open-sourced, do not adhere to the standard training and evaluation rules discussed above, or cannot be replicated based on our extensive replication experiments.

Click to expand the MoleculeNet results

Category	Dataset	Task Metric	MoleculeNet (Model)	Chemprop	MMNB	ChemFM-3B
Pharmacokinetic	BBBP	ROC-AUC ↑	0.690 (Weave)	0.738	0.739	0.751
Bioactivity	BACE	ROC-AUC ↑	0.806 (Weave)	-	0.835	0.869
	HIV	ROC-AUC ↑	0.763 (GC)	0.776	0.777	0.807
	MUV	PRC-AUC ↑	0.109 (Weave)	0.041	0.096	0.135
	PCBA	PRC-AUC ↑	0.136 (GC)	0.335	0.276	0.346
Toxicity	Tox21	ROC-AUC ↑	0.829 (GC)	0.851	0.845	0.869
	SIDER	ROC-AUC ↑	0.638 (GC)	0.676	0.680	0.709
	ClinTox	ROC-AUC ↑	0.832 (Weave)	0.864	0.888	0.918
Physicochemical	ESOL	RMSE ↓	0.580 (MPNN)	0.555	0.575	0.516
	FreeSolv	RMSE ↓	1.150 (MPNN)	1.075	1.155	0.830
	Lipophilicity	RMSE ↓	0.655 (GC)	0.555	0.625	0.545
Molecular Binding	PDBbind-Full	RMSE ↓	1.440 (GC)	1.391	0.721	0.697

(back to top)

ADMET Results

If you want to evaluate or submit the results, you can download the trained model checkpoints for each dataset. Then, simply run the following command:

python ./submit_admet.py --model_path <path to the checkpoints> --dataset <dataset_name> --task_type <regression or classification>

Important

We carefully reviewed the published code and excluded certain methods from comparison in the ADMET benchmark because they did not follow the evaluation rules outlined above. The reasons for excluding these methods are detailed in Table S2.5 of the paper.

Click to expand the ADMET results

Category	Dataset	Task Metric	Previous Best	ChemFM
Absorption	Caco2_Wang	MAE ↓	0.330 ± 0.024 @Chemprop-RDKit	0.322 ± 0.026
	Bioavailability_Ma	ROC-AUC ↑	0.672 ± 0.021 @DeepPurpose	0.715 ± 0.011
	Lipophilicity_AstraZeneca	MAE ↓	0.467 ± 0.006 @Chemprop-RDKit	0.460 ± 0.006
	Solubility_AqSolDB	MAE ↓	0.761 ± 0.025 @Chemprop-RDKit	0.725 ± 0.011
	HIA_Hou	ROC-AUC ↑	0.981 ± 0.002 @Chemprop-RDKit	0.984 ± 0.004
	Pgp_Broccatelli	ROC-AUC ↑	0.929 ± 0.006 @AttrMasking	0.931 ± 0.003
Distribution	BBB_Martins	ROC-AUC ↑	0.897 ± 0.004 @ContextPred	0.908 ± 0.010
	PPBR_AZ	MAE ↓	7.788 ± 0.210 @Chemprop	7.505 ± 0.073
	VDss_Lombardo	Spearman ↑	0.561 ± 0.025 @DeepPurpose	0.662 ± 0.013
Metabolism	CYP2C9_Veith	PRC-AUC ↑	0.777 ± 0.003 @Chemprop-RDKit	0.788 ± 0.005
	CYP2D6_Veith	PRC-AUC ↑	0.673 ± 0.007 @Chemprop-RDKit	0.704 ± 0.003
	CYP3A4_Veith	PRC-AUC ↑	0.876 ± 0.003 @Chemprop-RDKit	0.878 ± 0.003
	CYP2C9_Substrate_CarbonMangels	PRC-AUC ↑	0.400 ± 0.008 @Chemprop-RDKit	0.414 ± 0.027
	CYP2D6_Substrate_CarbonMangels	PRC-AUC ↑	0.686 ± 0.031 @Chemprop-RDKit	0.739 ± 0.024
	CYP3A4_Substrate_CarbonMangels	ROC-AUC ↑	0.619 ± 0.030 @Chemprop-RDKit	0.654 ± 0.022
Excretion	Half_Life_Obach	Spearman ↑	0.329 ± 0.083 @DeepPurpose	0.551 ± 0.020
	Clearance_Hepatocyte_AZ	Spearman ↑	0.439 ± 0.026 @ContextPred	0.495 ± 0.030
	Clearance_Microsome_AZ	Spearman ↑	0.599 ± 0.025 @Chemprop-RDKit	0.611 ± 0.016
Toxicity	LD50_Zhu	MAE ↓	0.606 ± 0.024 @Chemprop	0.541 ± 0.015
	hERG	ROC-AUC ↑	0.841 ± 0.020 @DeepPurpose	0.848 ± 0.009
	AMES	ROC-AUC ↑	0.850 ± 0.004 @Chemprop-RDKit	0.854 ± 0.007
	DILI	ROC-AUC ↑	0.919 ± 0.008 @ContextPred	0.920 ± 0.012

(back to top)

Supported Features

The provided code examples demonstrate how to fine-tune ChemFM, and currently, the code only supports the following fine-tuning techniques and tasks, with more options to be tested and extended.

You are welcome to customize the output head and loss function to adapt to other types of tasks.

Fine-tuning Techniques

LoRA (Low-Rank Adaptation)
Full-parameters fine-tuning (need to test)

Supported Tasks

Single-label regression and classification (binary labels)
Multi-label classification (each instance can be assigned to multiple binary labels)
Multi-class classification (each instance is assigned to one and only one label; the number of labels is greater than 2)
Hybrid classification and regression labels

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

property_prediction

property_prediction

readme.md

Molecular Property Prediction

Overview

Steps to Fine-tune ChemFM

1.1 MoleculeNet

1.2 ADMET

1.3 CustomDataset

Benchmark Results

MoleculeNet Results

ADMET Results

Supported Features

Files

property_prediction

Directory actions

More options

Directory actions

More options

Latest commit

History

property_prediction

Folders and files

parent directory

readme.md

Molecular Property Prediction

Overview

Steps to Fine-tune ChemFM

1.1 MoleculeNet

1.2 ADMET

1.3 CustomDataset

Benchmark Results

MoleculeNet Results

ADMET Results

Supported Features