MMSite: A Multi-modal Framework for the Identification of Active Sites in Proteins

Update 2025-1-4: We have updated the pretrained fusion model on Zenodo at https://zenodo.org/records/14599105 to resolve the issue with the incorrect module import path during inference.

Update 2024-10-29: The pretrained models of MMSite are available at https://zenodo.org/records/14004698.

1. Preparation

1. Environment

You can manage the environment by Anaconda. We have provided the environment configuration file environment.yml for reference. You can create the environment by the following command:

conda env create -f environment.yml

2. Data

You can follow the instructions in dataprocess/README.md to prepare the data. In this .md file, we provide the instruction to split the data when the clustering threshold is 10%. You can also change the threshold when you execute the mmseqs2 command.

2. Training

2.1 Download the Pre-trained Model

In our MMSite, we use the pre-trained PLM and BLM models to initialize the features. You can download the pre-trained model from the Higging Face to reproduce the main results in our paper. You can put all the downloaded models in the pretrained_weights folder.

2.1 Configuration

You can specify the configuration in config.yaml, including the paths of the pre-trained models and the data, training parameters, etc.

2.2 Training

You can train the model by the following command (It takes about 7 hours to finish training on a single NVIDIA GeForce RTX 4090 GPU):

python train.py --config /path/to/config.yaml

Then, you will get best_model_fuse_xxx.pt model in the runs/timestamp folder, which is the final model.

3. Inference

You should put your data in the dataset/infer.tsv with the format like dataset/infer_samples.tsv. Then, you should specify the path of best_model_fuse_xxx.pt in inference.py. Additionaly, you need to generate the textual descriptions via Prot2Text, and replace the corresponding configuration in config/config.yaml with the path of generated generated_desc.json. Finally, you can run the following command to get the prediction results:

python inference.py

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
dataprocess		dataprocess
datasets		datasets
pretrained_weights		pretrained_weights
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
inference.py		inference.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMSite: A Multi-modal Framework for the Identification of Active Sites in Proteins

1. Preparation

1. Environment

2. Data

2. Training

2.1 Download the Pre-trained Model

2.1 Configuration

2.2 Training

3. Inference

About

Releases

Packages

Languages

License

Gift-OYS/MMSite

Folders and files

Latest commit

History

Repository files navigation

MMSite: A Multi-modal Framework for the Identification of Active Sites in Proteins

1. Preparation

1. Environment

2. Data

2. Training

2.1 Download the Pre-trained Model

2.1 Configuration

2.2 Training

3. Inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages