Skip to content

WGLab/CoT-RAG-LLM-Gene-Prioritization-Disease-Diagnosis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

CoT-RAG-LLM-Gene-Prioritization-Disease-Diagnosis

This GitHub page contains all the datasets and code for the paper titled "Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes".

Datasets:

All the publicly available datasets used for evaluation are in the dataset folder, including all the sythetic clinical notes generated by Phenopacket-Store and the selected cohort of 5,980 clinical notes used for testing. Additionally, we include the pubmed_free_text with 255 literature-derived clinical notes, which is originally compiled in LLM-Gene-Prioritization for this paper.

Phenopacket-derived clinical notes were synthesized by us in ChatGPT with a specific prompting strategy. If you use the Phenopacket-derived clinical notes in your studies, please cite both the Phenopacket paper and our paper.

Reference

Wu D, et al. Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes. arXiv, arXiv:2503.12286 [cs.CL], 2025

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •