CRISPR-Cas9 Off-Target Prediction: A 21-Sprint Adversarial Case Study

Companion repository for:

Vibe Science: How Adversarial Agent Loops Turn Vibe Researching into Verifiable Science Carmine Russo, Elisa Bertelli — VibeX 2026 (co-located with EASE 2026), Glasgow, June 9–12, 2026

This repository contains the complete research artifacts from a 21-sprint investigation into CRISPR-Cas9 off-target prediction, conducted using the Vibe Science adversarial agent loop (Claude Code as Researcher-Agent, ChatGPT as external Reviewer-Agent). The investigation ran from January to February 2026.

Why This Repository Exists

The VibeX 2026 paper focuses on the agent architecture (adversarial loops, claim ledgers, serendipity engines), not on the biology. This repository provides the raw evidence: every sprint report, every analysis script, every claim with its lifecycle, and every reviewer intervention — so that the process claims in the paper can be independently verified.

The Research Journey (TL;DR)

Phase	Sprints	What Happened
Failure	1–2	Original hypothesis (Unbalanced Optimal Transport models chromatin filter) scored AUROC 0.375 — worse than random guessing (0.500).
Serendipity	3	The Serendipity Engine flagged structured residual patterns in the failure data (score 13/15), triggering a formal pivot.
Pivot	4–5	Shifted to an "Affinity-First" framework. Discovered that 87% of cleaved sites fall in the top binding-affinity quartile.
Stress-test	6–8	R2 (ChatGPT) demanded hierarchical bootstrap. The "Regime Switch" claim collapsed — confidence intervals overlapped. Pivoted again to positional mismatch effects.
Discovery	9–12	Found that transitions are tolerated better than transversions (p = 2.35e-69), with a unique cytosine exception. Built a 5-feature Macro model that outperforms 20-feature Fine models.
Falsification	13–14	Cross-cell-line validation: macro patterns generalize, fine position rankings do not. Permutation tests passed.
Deep dive	15–16	Discovered the C exception (cytosine violates the Trans > Transv rule). Found a suspicious OR = 2.30 for consecutive mismatches.
Paper-saver	17	R2 demanded propensity matching. Exact stratified matching on total-mismatch count (57 strata) reversed the coefficient from −0.379 to +0.022. The entire consecutive-mismatch effect was a confounder. Caught before any draft was written.
Cross-assay	18	4 findings validated on an independent GUIDE-seq dataset (1,380,770 sites, 0.088% positive rate).
Anti-leakage	19	All 78 guides confirmed as dissimilar (minimum Hamming distance = 7). No data leakage.
Final stress	20	All surviving claims re-tested with block bootstrap. Macro model: AUPRC 0.365 (14.5× lift over random).
Mechanism	21	6/6 structural predictions from Cas9 biology confirmed. Two-stage model (R-loop formation + conformational checkpoint) is consistent with known biophysics.

Key Results

Model Performance

Model	AUPRC	vs. Random	vs. MIT Score	vs. CFD Score
Macro (5 features)	0.365	14.5×	+350%	+166%
Fine (20 features)	0.352	14.0×	—	—
CFD (baseline)	0.137	5.4×	—	—
MIT (baseline)	0.081	3.2×	—	—

Claim Ledger Summary

Status	Count	Description
Validated	7	Survived cross-assay replication + block bootstrap
Qualified	4	Signal present but requires caveats
Downgraded	5	Signal partial or unreliable as originally stated
Killed	6	Fully refuted (including UOT and consecutive-mismatch effect)
Exploratory	12	Discussed but never formally promoted
Total	34	50% retraction rate among promoted claims

Four Cross-Assay Validated Findings

Position-dependent mismatch tolerance: Mismatches near the PAM-proximal end are disproportionately damaging to Cas9 cleavage.
Transition > Transversion tolerance: Transitions are tolerated ~2× better than transversions (p < 1e-6 in both assays), with a unique C exception where C>A (transversion) is tolerated better than C>T (transition).
Macro features > Fine features: A 5-feature model (log_change, n_mm, seed_mm, trans_ratio, is_ngg) outperforms 20 fine-grained positional features.
Mismatch-burden threshold: Sharp conformational checkpoint at 4–5 total mismatches; 91.2% of high-affinity sites are blocked in vivo.

Repository Structure

crispr-offtarget-serendipity/
│
├── README.md                          # This file
│
├── sprint-reports/                    # Markdown reports per sprint
│   ├── SPRINT5_RESULTS.md
│   ├── SPRINT8_CRITICAL_FINDINGS.md
│   ├── SPRINT8_FINAL_FINDINGS.md
│   ├── SPRINT9_FINDINGS.md
│   ├── SPRINT10-12_FINDINGS.md
│   ├── SPRINT14_FALSIFICATION_REPORT.md
│   ├── SPRINT15_DEEP_ANALYSIS_REPORT.md
│   ├── SPRINT16_COMPREHENSIVE_REPORT.md
│   ├── SPRINT17_CRITICAL_REPORT.md    # The "paper-saver" episode
│   ├── SPRINT18_FINAL_REPORT.md       # Cross-assay validation
│   ├── SPRINT19_FALSIFICATION_REPORT.md
│   ├── SPRINT20_STRESS_TEST_REPORT.md
│   └── SPRINT21_MECHANISM_REPORT.md
│
├── research-journey/                  # High-level summaries and overviews
│   ├── ENDOCRISPR_RESEARCH_JOURNEY_v3.md
│   ├── ENDOCRISPR_UNIFIED_REPORT.md
│   ├── MASTER_INDEX.md
│   └── CROSS_ASSAY_FINAL_RESULTS.md
│
├── claim-ledger/                      # Formal claim tracking
│   ├── CLAIM_LEDGER_SPRINT1-21.md     # 34 claims with lifecycle
│   └── MANIFEST_SPRINT1-21.md         # Sprint-by-sprint artifact registry
│
├── scripts/                           # Analysis scripts (Python)
│   ├── endocrispr_uot_analysis.py     # Sprint 1-2: UOT (failed)
│   ├── sprint3_prepare_data.py
│   ├── sprint3_prepare_data_v2.py
│   ├── sprint3_ablation_suite.py
│   ├── sprint4_hypothesis_test.py
│   ├── sprint4_affinity_stratified.py
│   ├── sprint4_master_analysis.py
│   ├── sprint5_competition_analysis.py
│   └── ... (63 scripts total)
│
├── notebooks/                         # Colab / Jupyter notebooks
│   ├── EndoCRISPR_Sprint2_Colab.ipynb
│   ├── EndoCRISPR_Sprint3_Colab.ipynb
│   ├── EndoCRISPR_Sprint3_FINAL.ipynb
│   ├── Sprint4_ATAC_Extraction_Colab.ipynb
│   ├── Sprint6_H3K27ac_Extraction_Colab.ipynb
│   ├── colab_GSE149363_analysis.ipynb
│   ├── COLAB_cross_assay_validation.ipynb
│   ├── COLAB_dl_benchmark.ipynb
│   ├── COLAB_cross_assay_model_transfer.ipynb
│   └── EndoCRISPR_Sprint3_Colab_LITE.ipynb
│
├── results/                           # CSV outputs from analysis scripts
│   └── ... (64 result CSVs)
│
└── r2-interventions/                  # ChatGPT (external R2) review logs
    └── CHATGPT_FEEDBACK_LOG.md

Dataset

Primary dataset: Lazzarotto et al., 2020, Nature Biotechnology 38:1317–1327. DOI: 10.1038/s41587-020-0555-7 GEO Accession: GSE149363

80,306 candidate off-target sites across 78 guide RNA sequences
Primary human CD4+/CD8+ T-cells
CHANGE-seq (in vitro) vs. GUIDE-seq (in cellula) cross-assay design
2.52% positive rate (extreme class imbalance)

Cross-assay validation dataset: 1,380,770 GUIDE-seq sites (0.088% positive rate) from independent experimental conditions.

The Adversarial Process

This research was conducted using the Vibe Science v3.5 protocol:

R1 (Researcher-Agent): Claude Code (Opus 4.5, Anthropic) — executed all analyses, generated scripts, produced sprint reports.
R2 (External Reviewer-Agent): ChatGPT (GPT-5.2, OpenAI) — reviewed each sprint with no access to code or data. Its only instruction: "Demolish everything demolishable. Trust only the data."
Human operators: Carmine Russo and Elisa Bertelli — made pivot decisions, transferred sprint summaries between agents, and served as final arbiters.

Notable R2 Interventions

Sprint	R2 Demanded	Impact
5	Covariate isolation for sponge effect	Confounding exposed
8	Hierarchical bootstrap CIs	"Regime Switch" claim killed (d = 0.07)
11	Challenged "bidirectional" terminology	Corrected to "differential tolerance"
14	Permutation tests for Trans > Transv	Finding survived
16	Flagged OR = 2.30 as suspicious	Queued for propensity matching
17	Demanded propensity matching	Consecutive-mismatch claim killed (paper-saver)
19	Anti-leakage audit	Data integrity confirmed

Reproducing Results

Requirements

Python 3.10+
scikit-learn, scipy, numpy, pandas
pyBigWig (for ATAC-seq extraction)
Google Colab (for notebooks) or local Jupyter

Regenerating Paper Tables

# Table 2 (Claim Evolution): derived from claim-ledger/CLAIM_LEDGER_SPRINT1-21.md
# Table 3 (Feature Traceability): derived from claim-ledger/MANIFEST_SPRINT1-21.md
# Table 4 (R2 Interventions): derived from r2-interventions/CHATGPT_FEEDBACK_LOG.md

The claim ledger and manifest are structured markdown files that map directly to the paper's tables.

Citation

If you use these artifacts, please cite:

@inproceedings{russo2026vibescience,
  author    = {Russo, Carmine and Bertelli, Elisa},
  title     = {Vibe Science: How Adversarial Agent Loops Turn Vibe Researching into Verifiable Science},
  booktitle = {Proceedings of the 1st International Workshop on Vibe Coding and Vibe Researching (VibeX), co-located with EASE 2026},
  year      = {2026},
  location  = {Glasgow, Scotland, United Kingdom}
}

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRISPR-Cas9 Off-Target Prediction: A 21-Sprint Adversarial Case Study

Why This Repository Exists

The Research Journey (TL;DR)

Key Results

Model Performance

Claim Ledger Summary

Four Cross-Assay Validated Findings

Repository Structure

Dataset

The Adversarial Process

Notable R2 Interventions

Reproducing Results

Requirements

Regenerating Paper Tables

Related

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CRISPR-Cas9 Off-Target Prediction: A 21-Sprint Adversarial Case Study

Why This Repository Exists

The Research Journey (TL;DR)

Key Results

Model Performance

Claim Ledger Summary

Four Cross-Assay Validated Findings

Repository Structure

Dataset

The Adversarial Process

Notable R2 Interventions

Reproducing Results

Requirements

Regenerating Paper Tables

Related

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages