Skip to content

Add Distributional AGI Safety simulation framework#8

Open
rsavitt wants to merge 1 commit intoGiskard-AI:mainfrom
rsavitt:add-distributional-agi-safety
Open

Add Distributional AGI Safety simulation framework#8
rsavitt wants to merge 1 commit intoGiskard-AI:mainfrom
rsavitt:add-distributional-agi-safety

Conversation

@rsavitt
Copy link

@rsavitt rsavitt commented Feb 10, 2026

New Resource

Adding a multi-agent simulation framework for studying distributional safety in AI systems using probabilistic soft labels.

Resource type: Open-source framework + paper
Section: General ML Testing
Tags: #Robustness #Fairness

The framework models governance trade-offs (taxes, staking, audits, collusion detection) across cooperative, contested, and adversarial regimes. It replaces binary safety classifications with calibrated probabilities (p = P(v = +1)) to surface adverse selection dynamics invisible to hard labels.

Results across 11 scenarios identify a critical adversarial threshold (37.5–50%) and show that structural collusion detection provides qualitatively different protection than individual-level governance levers.

Framework: https://github.com/swarm-ai-safety/swarm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant