Comparative structural analysis of the GREB1-like protein (Q9C091) in Homo sapiens using AlphaFold2, Phyre2, and ESMFold.
- Overview
- Scientific Context
- Methods & Workflow
- Results Summary
- RMSD Analysis
- Quality Control Findings
- AlphaFold2 Advanced Metrics
- Project Structure
- How to Reproduce
- Tools & Technologies
- Contributors
- Conclusion
This project focuses on the comparative structural analysis of the GREB1-like protein (Q9C091) in Homo sapiens using three state-of-the-art protein structure prediction methods:
| Method | Approach |
|---|---|
| AlphaFold2 | Deep learning-based full structure prediction |
| Phyre2 | Homology modeling using known templates |
| ESMFold | Protein language model-based prediction |
The goal is to evaluate, compare, and validate predicted 3D protein structures using multiple bioinformatics and structural biology tools.
RNA-related proteins and transcription-associated complexes often require accurate structural prediction to understand their biological function.
The GREB1-like protein:
- Contains 1923 amino acids
- Has no complete experimental structure available in PDB
- Requires computational modeling approaches
👉 This project aims to compare different prediction models, evaluate their reliability, and analyze structural consistency.
Protein Sequence (GREB1L — Q9C091)
│
▼
┌──────────────────────────┐
│ 1. Sequence Analysis │──→ BLASTp homolog identification
└────────────┬─────────────┘
│
┌───────┼───────┐
▼ ▼ ▼
┌────────┐┌────────┐┌────────┐
│Alpha- ││Phyre2 ││ESM- │
│Fold2 ││ ││Fold │
└───┬────┘└───┬────┘└───┬────┘
│ │ │
└─────────┼─────────┘
▼
┌──────────────────────────┐
│ 3. Structural Comparison│──→ RMSD calculation (PyMOL)
└────────────┬─────────────┘
│
▼
┌──────────────────────────┐
│ 4. Quality Assessment │──→ ProQ2, Ramachandran, SAVES
└──────────────────────────┘
│
▼
✅ Validated Models
🔹 AlphaFold2
- Deep learning-based model
- Predicts full protein structure
- Provides confidence scores (pLDDT, PAE)
🔹 Phyre2
- Homology modeling approach
- Uses known protein templates
- Confidence based on template similarity
🔹 ESMFold
- Language model-based prediction
- Faster but may predict partial structures
BLASTp was used to identify homologous sequences. High similarity was found with GREB1-like isoforms (≈100% identity), confirming the biological relevance of the sequence.
Structures were compared using RMSD (Root Mean Square Deviation) to measure similarity between predicted models. Partial alignment was used for fair comparison across methods with different coverage.
Multiple tools were used for validation:
| Tool | Purpose |
|---|---|
| ProQ2 | Predicts global and local model quality |
| Ramachandran Plot | Evaluates backbone conformation (α-helices, β-sheets, outliers) |
| SAVES | Detects steric clashes, B-factor anomalies, side-chain issues |
| Model | Domains Predicted | Coverage | Strength |
|---|---|---|---|
| AlphaFold2 | 4 domains | Full | Best global accuracy |
| Phyre2 | 1 domain (~228 aa) | Partial | Template-based, moderate reliability |
| ESMFold | 1 small domain (~50–60 aa) | Very low | Best local precision |
- AlphaFold2 → best global model with high confidence regions (pLDDT 60–90)
- ESMFold → best local precision but limited global coverage
- Phyre2 → template-dependent, less reliable, several structural inconsistencies
| Comparison | RMSD (Å) | Interpretation |
|---|---|---|
| Phyre2 vs AlphaFold2 | ≈ 4.16 | Moderate deviation, less accurate structurally |
| ESMFold vs AlphaFold2 | ≈ 2.81 | Closer locally, higher structural agreement |
👉 Lower RMSD = higher structural similarity. ESMFold shows better local agreement with AlphaFold2 than Phyre2.
Common issues identified across models:
- Steric clashes (Van der Waals)
- Abnormal backbone conformations
- Side-chain optimization issues
- B-factor inconsistencies
pLDDT (per-residue confidence):
- High values → reliable regions
- Low values (<50) → uncertain regions
PAE (Predicted Alignment Error):
- Shows domain positioning reliability
- Confirms stable domain regions
greb1l-structure-analysis/
│
├── report/ # PDF report and figures
│ └── final_report.pdf
│
├── figures/ # Structural visualizations
│ ├── alphafold2/ # AlphaFold2 PyMOL renders
│ ├── phyre2/ # Phyre2 model visualizations
│ ├── esmfold/ # ESMFold predictions
│ ├── ramachandran/ # Ramachandran plots
│ └── rmsd_comparison/ # Structural overlay figures
│
└── README.md # Project documentation
1. Retrieve protein sequence:
- Download GREB1L (Q9C091) from UniProt
2. Run prediction tools:
3. Perform analysis:
- BLAST analysis for homolog identification
- Structural alignment using PyMOL
4. Evaluate:
- RMSD calculation between models
- Ramachandran plot analysis
- ProQ2 quality scores
| Category | Tools |
|---|---|
| Structure Prediction | AlphaFold2, Phyre2, ESMFold |
| Sequence Analysis | BLAST+ |
| Visualization | PyMOL |
| Quality Assessment | ProQ2, SAVES |
| Data Analysis | Python |
- Fatine Hichami
- Tugce Koytaviloglu
- Structural prediction models vary significantly in coverage and accuracy
- Combining multiple methods improves reliability of results
- Local vs global accuracy must be distinguished when evaluating models
- Quality control is essential in computational structural biology
This project demonstrates that no single model is sufficient for protein structure prediction.
A combined approach using:
- AlphaFold2 for global structure
- ESMFold for local precision
provides a more robust understanding of protein conformation.