Molecule Generation with
Fragment Retrieval Augmentation

This is the official code repository for the paper titled Molecule Generation with Fragment Retrieval Augmentation (NeurIPS 2024).

Contribution

We introduce $f$ -RAG, a novel molecular generative framework that combines fragment-based drug discovery (FBDD) and retrieval-augmented generation (RAG).
We propose a retrieval augmentation strategy that operates at the fragment level with two types of fragments: hard fragments and soft fragments, allowing fine-grained guidance to achieve an improved exploration-exploitation trade-off and generate high-quality drug candidates.
Through extensive experiments, we demonstrate the effectiveness of $f$ -RAG in various drug discovery tasks that simulate real-world scenarios.

Installation

Clone this repository:

git clone https://github.com/NVlabs/f-RAG.git
cd f-RAG

Run the following commands to install the dependencies:

conda create -n f-rag python=3.10
conda activate f-rag
pip install safe-mol transformers==4.38.2 pandas==2.0 scikit-learn==1.0.2 numpy==1.25 PyTDC==0.4.1 easydict
conda install -c conda-forge openbabel  # required to run the docking experiments

Training Fragment Injection Module

The lightweight fragment injection module is the only part that requires training in $f$ -RAG.
We provide the data to train the model and evaluate the results. Download and place the data folder in this directory.

To train the module from scratch, first run the following command to preprocess the data:

python preprocess.py

We provide a partially preprocessed data file data/zinc250k_train.csv for ease of use. To preprocess the data from scratch, delete this file before running the preprocessing.

Then, run the following command to train the module:

python fusion/trainer/train.py \
    --dataset data/zinc250k \
    --output_dir ${output_dir} \
    --per_device_train_batch_size 128 \
    --save_strategy epoch \
    --num_train_epochs 8 \
    --learning_rate 1e-4

We used a single NVIDIA GeForce RTX 3090 GPU to train the module.

Running PMO Experiments (Section 4.1)

The folder mol_opt contains the code to run the experiments on the practical molecular optimization (PMO) benchmark and is based on the official benchmark codebase.
First run the following command to construct an initial fragment vocabulary:

python get_vocab.py pmo

Then, run the following command to run the experiments:

cd exps/pmo
python run.py -o ${oracle_name} -s ${seed}

You can adjust hyperparameters in exps/pmo/main/f_rag/hparams.yaml.

Run the following command to evaluate the generated molecules:

python eval.py ${file}

Running Docking Experiments (Section 4.2)

The folder dock contains the code to run the experiments on the docking score optimization tasks.
Before running the experiments, place the trained fragment injection module model.safetensors under the folder dock. First run the following command to construct an initial fragment vocabulary:

python get_vocab.py dock

Then, run the following command to run the experiments:

cd exps/dock
python run.py -o ${oracle_name} -s ${seed}

You can adjust hyperparameters in exps/dock/hparams.yaml.

Run the following command to evaluate the generated molecules:

python eval.py ${file}

License

Copyright @ 2025, NVIDIA Corporation. All rights reserved.
This work is made available under the Nvidia Source Code License-NC.
For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Citation

If you find this repository and our paper useful, we kindly request to cite our work.

@article{lee2024frag,
  title     = {Molecule generation with fragment retrieval augmentation},
  author    = {Lee, Seul and Kreis, Karsten and Veccham, Srimukh and Liu, Meng and Reidenbach, Danny and Paliwal, Saee and Vahdat, Arash and Nie, Weili},
  journal   = {Advances in Neural Information Processing Systems},
  year      = {2024}
}

Name	Name	Last commit message	Last commit date
Latest commit Seul Lee header modified Apr 2, 2025 981e317 · Apr 2, 2025 History 4 Commits
assets	assets	initial commit	Mar 7, 2025
exps	exps	header modified	Apr 2, 2025
fusion	fusion	header modified	Apr 2, 2025
ga	ga	header modified	Apr 2, 2025
license_thirdparty	license_thirdparty	header modified	Apr 2, 2025
LICENSE	LICENSE	header modified	Apr 2, 2025
README.md	README.md	readme modified	Mar 7, 2025
get_vocab.py	get_vocab.py	initial commit	Mar 7, 2025
preprocess.py	preprocess.py	initial commit	Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Molecule Generation with
Fragment Retrieval Augmentation

Contribution

Installation

Training Fragment Injection Module

Running PMO Experiments (Section 4.1)

Running Docking Experiments (Section 4.2)

License

Citation

About

Releases

Packages

Languages

License

NVlabs/f-RAG

Folders and files

Latest commit

History

Repository files navigation

Molecule Generation withFragment Retrieval Augmentation

Contribution

Installation

Training Fragment Injection Module

Running PMO Experiments (Section 4.1)

Running Docking Experiments (Section 4.2)

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Molecule Generation with
Fragment Retrieval Augmentation

Packages