The integrity of scientific literature depends on citations that are supported by the referenced source material. These citations are sometimes inaccurate, contributing to unverified claims. AI4citations provides an easy-to-use solution for automated citation verification that leverages state-of-the-art machine learning models trained on domain-specific datasets.
- Academic Researchers: Verify citations in literature reviews and research papers
- Journal Editors: Automated fact-checking during peer review process
- Students: Learn proper citation practices and evidence evaluation
- Science Communicators: Verify claims in popular science writing
- Fact-checkers: Quick verification of scientific claims in media
No installation required! Use AI4citations directly in your browser:
π Launch AI4citations on Hugging Face Spaces
-
Clone the repository
git clone https://github.com/jedick/AI4citations.git cd AI4citations
-
Install dependencies
pip install -r requirements.txt
-
Set up OpenAI API key (optional, for GPT retrieval)
export OPENAI_API_KEY="your-api-key-here"
-
Launch the application
gradio app.py
-
Access the app
- Open your browser and navigate to the displayed URL (typically
http://127.0.0.1:7860
) - Upload a PDF or input text directly to start verifying citations
- Open your browser and navigate to the displayed URL (typically
- Input a claim (hypothesis) you want to verify
- Provide evidence in one of two ways:
- Upload a PDF and use automatic evidence retrieval
- Manually input evidence text
- Get predictions with confidence scores for:
- Support: Evidence supports the claim
- Refute: Evidence contradicts the claim
- NEI (Not Enough Information): Evidence is insufficient
- Provide feedback to help improve the model
This app is part of a comprehensive ML engineering ecosystem:
- ποΈ MLE Capstone Project - Complete ML pipeline with baselines, evaluation, and deployment
- π¦ pyvers Package - Python package for training claim verification models
- π€ Fine-tuned Model - Production model on Hugging Face
- Fine-tuned DeBERTa (default): Trained on SciFact and Citation-Integrity datasets for scientific claim verification
- Base DeBERTa: Pre-trained on multiple natural language inference (NLI) datasets
- Interactive model switching: Compare results between different models
- Detailed predictions: Get instant results with confidence scores
Choose from three complementary approaches to extract relevant evidence from PDFs:
- π BM25S (Traditional keyword matching with BM25 ranking)
- π§ DeBERTa (AI-based question-answering with context extraction)
- π€ OpenAI GPT (Advanced AI: Large language model with document understanding)
For BM25S and DeBERTa, you can adjust the number of evidence sentences retrieved (top-k sentences).
- Interactive examples: Pre-loaded examples for each prediction class
- PDF upload: Drag-and-drop PDF processing
- Responsive design: Works on desktop and mobile devices
- GPU acceleration: Optimized for fast inference on Hugging Face Spaces
- Token usage tracking: Monitor OpenAI API usage
- Real-time feedback collection: Help improve the model with your corrections
Click here to see the collected feedback dataset!
Benchmarked on the SciFact test set with gold evidence as baseline:
Retrieval Method | Macro F1 | Speed (avg.) | Best Use Case |
---|---|---|---|
Gold evidence | 0.834 | - | Baseline (human-selected) |
BM25S | 0.649 | 0.36s | Fast keyword matching |
DeBERTa | 0.610 | 7.00s | Semantic understanding |
GPT | 0.615 | 19.84s | Complex reasoning |
The fine-tuned model achieves a 7 percentage point improvement over single-dataset baselines through multi-dataset training.
- Frontend: Gradio interface with custom styling and Font Awesome icons
- Backend: PyTorch Lightning with Hugging Face Transformers
- PDF Processing: PyMuPDF (fitz) with text cleaning and normalization
- Retrieval: Multiple engines (BM25S, DeBERTa QA, OpenAI GPT)
- Deployment: Hugging Face Spaces with GPU acceleration
- CI Testing: GitHub Actions workflow for integration and unit tests
- PDF Text Extraction: Multi-page processing with layout preservation
- Text Normalization: Unicode conversion, hyphen removal, sentence tokenization
- Evidence Retrieval: Method-specific processing (keyword, QA, or LLM-based)
- Claim Verification: Transformer-based classification with confidence scores
- Feedback Loop: User corrections saved for continuous improvement
The model was trained and evaluated on two high-quality datasets for claim verification in biomedical and health sciences:
- Size: 1,409 scientific claims verified against 5,183 abstracts
- Source: AllenAI SciFact Dataset
- Size: 3,063 citation instances from biomedical publications
- Source: Citation-Integrity Dataset
Both datasets were normalized with consistent labeling for robust cross-domain performance.
This project builds upon exceptional work from the research and open-source communities:
- Gradio: Web interface framework enabling easy ML app deployment
- Hugging Face Transformers: State-of-the-art transformer models and tokenizers
- PyTorch Lightning: Scalable ML training framework
- DeBERTa: Base model pre-trained on multiple NLI datasets by MoritzLaurer
- SciFact Dataset: Scientific claim verification dataset by Wadden et al. (2020)
- Citation-Integrity Dataset: Biomedical citation verification by Sarol et al. (2024)
- BM25S: High-performance BM25 implementation for keyword-based retrieval
- PyMuPDF (fitz): Robust PDF text extraction and processing
- OpenAI GPT: Advanced language model for complex reasoning tasks
- NLTK: Natural language processing utilities for tokenization
- Unidecode: Unicode to ASCII text conversion
- Codecov: Test coverage reporting and monitoring
- AI Assistance: BERT retrieval code developed with assistance from Claude Sonnet 4
- MultiVerS Model: Longformer-based claim verification by Wadden et al. (2021)
- Natural Language Inference: Foundational NLI datasets (MultiNLI, FEVER, ANLI)
- Domain Adaptation: Cross-dataset training techniques for improved generalization
For detailed technical information and experimental results, see the ML Engineering Capstone Project repository and associated blog posts.
π‘ Questions or Issues? Open an issue on GitHub!