Request for the negative datasets #1

zw-SIMM · 2024-10-19T11:06:33Z

Great work!
However, I’m a bit confused about the negative samples in your work,
even with the negative preparation codes provided.
Is the ratio of positive to negative samples set at 1:1000 for both training and testing?
Could you also provide the negative datasets as a benchmark for reproduction and comparision fairly?

Negative Sample. A common method involves designating all enzymes within a training set that are not annotated for catalyzing a specific reaction as negative samples [51]. Nevertheless, given the extensive size of our dataset, we opt for a strategy centered on enzyme and reaction similarity to construct negative samples. Specifically, for each verified positive enzyme-reaction pair, we identify the top-k enzymes that closely resemble the positive enzyme but do not have annotations for catalyzing the reaction, using them as negative samples. Similarly, we select the top-k reactions that are similar to the positive reaction but are not catalyzed by the positive enzyme, to serve as additional negative samples (k=1000). This method effectively narrows down the size of negative samples while retaining those of significance for both training and testing purposes. Despite our approach, the construction of negative samples still presents an unresolved challenge, remaining as an open question for future development.

The text was updated successfully, but these errors were encountered:

WillHua127 · 2024-10-19T20:20:37Z

Thanks for your interests. The negative dataset is approximately more than 10GB, that why we didnt choose to upload, it is just too much. You can create your own negative samples using mutations, or treating unseen enzyme-reaction pairs as negative samples, or using homology alignments.

zw-SIMM · 2024-10-20T04:05:57Z

Thanks for your interests. The negative dataset is approximately more than 10GB, that why we didnt choose to upload, it is just too much. You can create your own negative samples using mutations, or treating unseen enzyme-reaction pairs as negative samples, or using homology alignments.

Thanks for your reply. I understand that the negative dataset is large (>10GB), and uploading it may not be feasible. However, to better replicate your results and ensure alignment with your experimental settings, I would like to confirm few points:

Positive-to-Negative Sample Ratio:
Could you confirm whether the ratio of positive to negative samples is 1:1000 or 1:2000? Specifically, does each sequence or molecule have 1000 negative samples?

Negative Sample Generation Script:
While I see that prepare_negative.py generates dictionaries similar to sequence or molecule data, it isn't clear how to directly generate the complete negative samples used in your experiments. Could you provide the full script or detailed instructions for this step?

Thank you again for your excellent work and support. I look forward to your further guidance!

WillHua127 · 2024-10-22T22:48:13Z

You dont need exact negative samples to reproduce our results because our results are retrieval based, i.e., using only positive samples in evaluation. If you want to duplicate the ratio, it is 1:1000 for both sequence and molecule.

zw-SIMM · 2024-10-25T09:11:19Z

You dont need exact negative samples to reproduce our results because our results are retrieval based, i.e., using only positive samples in evaluation. If you want to duplicate the ratio, it is 1:1000 for both sequence and molecule.

Thanks for your reply again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for the negative datasets #1

Request for the negative datasets #1

zw-SIMM commented Oct 19, 2024 •

edited

Loading

WillHua127 commented Oct 19, 2024 •

edited

Loading

zw-SIMM commented Oct 20, 2024 •

edited

Loading

WillHua127 commented Oct 22, 2024

zw-SIMM commented Oct 25, 2024

Request for the negative datasets #1

Request for the negative datasets #1

Comments

zw-SIMM commented Oct 19, 2024 • edited Loading

WillHua127 commented Oct 19, 2024 • edited Loading

zw-SIMM commented Oct 20, 2024 • edited Loading

WillHua127 commented Oct 22, 2024

zw-SIMM commented Oct 25, 2024

zw-SIMM commented Oct 19, 2024 •

edited

Loading

WillHua127 commented Oct 19, 2024 •

edited

Loading

zw-SIMM commented Oct 20, 2024 •

edited

Loading