Skip to content
/ f-RAG Public

Official code repository for the paper Molecule Generation with Fragment Retrieval Augmentation (NeurIPS 2024)

License

Notifications You must be signed in to change notification settings

NVlabs/f-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Seul Lee
Apr 2, 2025
981e317 · Apr 2, 2025

History

4 Commits
Mar 7, 2025
Apr 2, 2025
Apr 2, 2025
Apr 2, 2025
Apr 2, 2025
Apr 2, 2025
Mar 7, 2025
Mar 7, 2025
Mar 7, 2025

Repository files navigation

Molecule Generation with
Fragment Retrieval Augmentation

This is the official code repository for the paper titled Molecule Generation with Fragment Retrieval Augmentation (NeurIPS 2024).

Contribution

  • We introduce f -RAG, a novel molecular generative framework that combines fragment-based drug discovery (FBDD) and retrieval-augmented generation (RAG).
  • We propose a retrieval augmentation strategy that operates at the fragment level with two types of fragments: hard fragments and soft fragments, allowing fine-grained guidance to achieve an improved exploration-exploitation trade-off and generate high-quality drug candidates.
  • Through extensive experiments, we demonstrate the effectiveness of f -RAG in various drug discovery tasks that simulate real-world scenarios.

Installation

Clone this repository:

git clone https://github.com/NVlabs/f-RAG.git
cd f-RAG

Run the following commands to install the dependencies:

conda create -n f-rag python=3.10
conda activate f-rag
pip install safe-mol transformers==4.38.2 pandas==2.0 scikit-learn==1.0.2 numpy==1.25 PyTDC==0.4.1 easydict
conda install -c conda-forge openbabel  # required to run the docking experiments

Training Fragment Injection Module

The lightweight fragment injection module is the only part that requires training in f -RAG.
We provide the data to train the model and evaluate the results. Download and place the data folder in this directory.

To train the module from scratch, first run the following command to preprocess the data:

python preprocess.py

We provide a partially preprocessed data file data/zinc250k_train.csv for ease of use. To preprocess the data from scratch, delete this file before running the preprocessing.

Then, run the following command to train the module:

python fusion/trainer/train.py \
    --dataset data/zinc250k \
    --output_dir ${output_dir} \
    --per_device_train_batch_size 128 \
    --save_strategy epoch \
    --num_train_epochs 8 \
    --learning_rate 1e-4

We used a single NVIDIA GeForce RTX 3090 GPU to train the module.

Running PMO Experiments (Section 4.1)

The folder mol_opt contains the code to run the experiments on the practical molecular optimization (PMO) benchmark and is based on the official benchmark codebase.
First run the following command to construct an initial fragment vocabulary:

python get_vocab.py pmo

Then, run the following command to run the experiments:

cd exps/pmo
python run.py -o ${oracle_name} -s ${seed}

You can adjust hyperparameters in exps/pmo/main/f_rag/hparams.yaml.

Run the following command to evaluate the generated molecules:

python eval.py ${file}

Running Docking Experiments (Section 4.2)

The folder dock contains the code to run the experiments on the docking score optimization tasks.
Before running the experiments, place the trained fragment injection module model.safetensors under the folder dock. First run the following command to construct an initial fragment vocabulary:

python get_vocab.py dock

Then, run the following command to run the experiments:

cd exps/dock
python run.py -o ${oracle_name} -s ${seed}

You can adjust hyperparameters in exps/dock/hparams.yaml.

Run the following command to evaluate the generated molecules:

python eval.py ${file}

License

Copyright @ 2025, NVIDIA Corporation. All rights reserved.
This work is made available under the Nvidia Source Code License-NC.
For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Citation

If you find this repository and our paper useful, we kindly request to cite our work.

@article{lee2024frag,
  title     = {Molecule generation with fragment retrieval augmentation},
  author    = {Lee, Seul and Kreis, Karsten and Veccham, Srimukh and Liu, Meng and Reidenbach, Danny and Paliwal, Saee and Vahdat, Arash and Nie, Weili},
  journal   = {Advances in Neural Information Processing Systems},
  year      = {2024}
}

About

Official code repository for the paper Molecule Generation with Fragment Retrieval Augmentation (NeurIPS 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages