Skip to content

th3vib3coder/crispr-offtarget-serendipity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

CRISPR-Cas9 Off-Target Prediction: A 21-Sprint Adversarial Case Study

Companion repository for:

Vibe Science: How Adversarial Agent Loops Turn Vibe Researching into Verifiable Science Carmine Russo, Elisa Bertelli — VibeX 2026 (co-located with EASE 2026), Glasgow, June 9–12, 2026

This repository contains the complete research artifacts from a 21-sprint investigation into CRISPR-Cas9 off-target prediction, conducted using the Vibe Science adversarial agent loop (Claude Code as Researcher-Agent, ChatGPT as external Reviewer-Agent). The investigation ran from January to February 2026.


Why This Repository Exists

The VibeX 2026 paper focuses on the agent architecture (adversarial loops, claim ledgers, serendipity engines), not on the biology. This repository provides the raw evidence: every sprint report, every analysis script, every claim with its lifecycle, and every reviewer intervention — so that the process claims in the paper can be independently verified.


The Research Journey (TL;DR)

Phase Sprints What Happened
Failure 1–2 Original hypothesis (Unbalanced Optimal Transport models chromatin filter) scored AUROC 0.375 — worse than random guessing (0.500).
Serendipity 3 The Serendipity Engine flagged structured residual patterns in the failure data (score 13/15), triggering a formal pivot.
Pivot 4–5 Shifted to an "Affinity-First" framework. Discovered that 87% of cleaved sites fall in the top binding-affinity quartile.
Stress-test 6–8 R2 (ChatGPT) demanded hierarchical bootstrap. The "Regime Switch" claim collapsed — confidence intervals overlapped. Pivoted again to positional mismatch effects.
Discovery 9–12 Found that transitions are tolerated better than transversions (p = 2.35e-69), with a unique cytosine exception. Built a 5-feature Macro model that outperforms 20-feature Fine models.
Falsification 13–14 Cross-cell-line validation: macro patterns generalize, fine position rankings do not. Permutation tests passed.
Deep dive 15–16 Discovered the C exception (cytosine violates the Trans > Transv rule). Found a suspicious OR = 2.30 for consecutive mismatches.
Paper-saver 17 R2 demanded propensity matching. Exact stratified matching on total-mismatch count (57 strata) reversed the coefficient from −0.379 to +0.022. The entire consecutive-mismatch effect was a confounder. Caught before any draft was written.
Cross-assay 18 4 findings validated on an independent GUIDE-seq dataset (1,380,770 sites, 0.088% positive rate).
Anti-leakage 19 All 78 guides confirmed as dissimilar (minimum Hamming distance = 7). No data leakage.
Final stress 20 All surviving claims re-tested with block bootstrap. Macro model: AUPRC 0.365 (14.5× lift over random).
Mechanism 21 6/6 structural predictions from Cas9 biology confirmed. Two-stage model (R-loop formation + conformational checkpoint) is consistent with known biophysics.

Key Results

Model Performance

Model AUPRC vs. Random vs. MIT Score vs. CFD Score
Macro (5 features) 0.365 14.5× +350% +166%
Fine (20 features) 0.352 14.0×
CFD (baseline) 0.137 5.4×
MIT (baseline) 0.081 3.2×

Claim Ledger Summary

Status Count Description
Validated 7 Survived cross-assay replication + block bootstrap
Qualified 4 Signal present but requires caveats
Downgraded 5 Signal partial or unreliable as originally stated
Killed 6 Fully refuted (including UOT and consecutive-mismatch effect)
Exploratory 12 Discussed but never formally promoted
Total 34 50% retraction rate among promoted claims

Four Cross-Assay Validated Findings

  1. Position-dependent mismatch tolerance: Mismatches near the PAM-proximal end are disproportionately damaging to Cas9 cleavage.
  2. Transition > Transversion tolerance: Transitions are tolerated ~2× better than transversions (p < 1e-6 in both assays), with a unique C exception where C>A (transversion) is tolerated better than C>T (transition).
  3. Macro features > Fine features: A 5-feature model (log_change, n_mm, seed_mm, trans_ratio, is_ngg) outperforms 20 fine-grained positional features.
  4. Mismatch-burden threshold: Sharp conformational checkpoint at 4–5 total mismatches; 91.2% of high-affinity sites are blocked in vivo.

Repository Structure

crispr-offtarget-serendipity/
│
├── README.md                          # This file
│
├── sprint-reports/                    # Markdown reports per sprint
│   ├── SPRINT5_RESULTS.md
│   ├── SPRINT8_CRITICAL_FINDINGS.md
│   ├── SPRINT8_FINAL_FINDINGS.md
│   ├── SPRINT9_FINDINGS.md
│   ├── SPRINT10-12_FINDINGS.md
│   ├── SPRINT14_FALSIFICATION_REPORT.md
│   ├── SPRINT15_DEEP_ANALYSIS_REPORT.md
│   ├── SPRINT16_COMPREHENSIVE_REPORT.md
│   ├── SPRINT17_CRITICAL_REPORT.md    # The "paper-saver" episode
│   ├── SPRINT18_FINAL_REPORT.md       # Cross-assay validation
│   ├── SPRINT19_FALSIFICATION_REPORT.md
│   ├── SPRINT20_STRESS_TEST_REPORT.md
│   └── SPRINT21_MECHANISM_REPORT.md
│
├── research-journey/                  # High-level summaries and overviews
│   ├── ENDOCRISPR_RESEARCH_JOURNEY_v3.md
│   ├── ENDOCRISPR_UNIFIED_REPORT.md
│   ├── MASTER_INDEX.md
│   └── CROSS_ASSAY_FINAL_RESULTS.md
│
├── claim-ledger/                      # Formal claim tracking
│   ├── CLAIM_LEDGER_SPRINT1-21.md     # 34 claims with lifecycle
│   └── MANIFEST_SPRINT1-21.md         # Sprint-by-sprint artifact registry
│
├── scripts/                           # Analysis scripts (Python)
│   ├── endocrispr_uot_analysis.py     # Sprint 1-2: UOT (failed)
│   ├── sprint3_prepare_data.py
│   ├── sprint3_prepare_data_v2.py
│   ├── sprint3_ablation_suite.py
│   ├── sprint4_hypothesis_test.py
│   ├── sprint4_affinity_stratified.py
│   ├── sprint4_master_analysis.py
│   ├── sprint5_competition_analysis.py
│   └── ... (63 scripts total)
│
├── notebooks/                         # Colab / Jupyter notebooks
│   ├── EndoCRISPR_Sprint2_Colab.ipynb
│   ├── EndoCRISPR_Sprint3_Colab.ipynb
│   ├── EndoCRISPR_Sprint3_FINAL.ipynb
│   ├── Sprint4_ATAC_Extraction_Colab.ipynb
│   ├── Sprint6_H3K27ac_Extraction_Colab.ipynb
│   ├── colab_GSE149363_analysis.ipynb
│   ├── COLAB_cross_assay_validation.ipynb
│   ├── COLAB_dl_benchmark.ipynb
│   ├── COLAB_cross_assay_model_transfer.ipynb
│   └── EndoCRISPR_Sprint3_Colab_LITE.ipynb
│
├── results/                           # CSV outputs from analysis scripts
│   └── ... (64 result CSVs)
│
└── r2-interventions/                  # ChatGPT (external R2) review logs
    └── CHATGPT_FEEDBACK_LOG.md

Dataset

Primary dataset: Lazzarotto et al., 2020, Nature Biotechnology 38:1317–1327. DOI: 10.1038/s41587-020-0555-7 GEO Accession: GSE149363

  • 80,306 candidate off-target sites across 78 guide RNA sequences
  • Primary human CD4+/CD8+ T-cells
  • CHANGE-seq (in vitro) vs. GUIDE-seq (in cellula) cross-assay design
  • 2.52% positive rate (extreme class imbalance)

Cross-assay validation dataset: 1,380,770 GUIDE-seq sites (0.088% positive rate) from independent experimental conditions.


The Adversarial Process

This research was conducted using the Vibe Science v3.5 protocol:

  • R1 (Researcher-Agent): Claude Code (Opus 4.5, Anthropic) — executed all analyses, generated scripts, produced sprint reports.
  • R2 (External Reviewer-Agent): ChatGPT (GPT-5.2, OpenAI) — reviewed each sprint with no access to code or data. Its only instruction: "Demolish everything demolishable. Trust only the data."
  • Human operators: Carmine Russo and Elisa Bertelli — made pivot decisions, transferred sprint summaries between agents, and served as final arbiters.

Notable R2 Interventions

Sprint R2 Demanded Impact
5 Covariate isolation for sponge effect Confounding exposed
8 Hierarchical bootstrap CIs "Regime Switch" claim killed (d = 0.07)
11 Challenged "bidirectional" terminology Corrected to "differential tolerance"
14 Permutation tests for Trans > Transv Finding survived
16 Flagged OR = 2.30 as suspicious Queued for propensity matching
17 Demanded propensity matching Consecutive-mismatch claim killed (paper-saver)
19 Anti-leakage audit Data integrity confirmed

Reproducing Results

Requirements

  • Python 3.10+
  • scikit-learn, scipy, numpy, pandas
  • pyBigWig (for ATAC-seq extraction)
  • Google Colab (for notebooks) or local Jupyter

Regenerating Paper Tables

# Table 2 (Claim Evolution): derived from claim-ledger/CLAIM_LEDGER_SPRINT1-21.md
# Table 3 (Feature Traceability): derived from claim-ledger/MANIFEST_SPRINT1-21.md
# Table 4 (R2 Interventions): derived from r2-interventions/CHATGPT_FEEDBACK_LOG.md

The claim ledger and manifest are structured markdown files that map directly to the paper's tables.


Related


Citation

If you use these artifacts, please cite:

@inproceedings{russo2026vibescience,
  author    = {Russo, Carmine and Bertelli, Elisa},
  title     = {Vibe Science: How Adversarial Agent Loops Turn Vibe Researching into Verifiable Science},
  booktitle = {Proceedings of the 1st International Workshop on Vibe Coding and Vibe Researching (VibeX), co-located with EASE 2026},
  year      = {2026},
  location  = {Glasgow, Scotland, United Kingdom}
}

License

Apache 2.0. See LICENSE.

About

CRISPR-Cas9 off-target prediction: 21-sprint case study artifacts for Vibe Science (VibeX 2026)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors