Slimemold

A sycophantic tool for preventing worse sycophancy. For Claude Code.

The model agrees with your unsourced claims. Then it agrees with the structural analysis showing your claims are unsourced. Then it enthusiastically agrees you should verify them. It's agreement all the way down.

If you just want to install it: skip to Installation.

The Problem: Reasoning That Stops Too Soon

When you partially understand something, it feels like understanding. A clean mental model, even a wrong one, produces the same warm glow of comprehension as a correct one. You stop digging. The partial answer was so satisfying that the question felt finished. The wrong answers feel exactly like the right ones. This turns out to be well-documented:

Processing fluency masquerades as truth. When information feels easy to process, we judge it as more likely to be true (Reber & Schwarz 1999, Topolinski & Strack 2009). The effect is modest in isolation (d ~ 0.3-0.5 in lab settings). Whether it compounds across multi-step reasoning — each fluent step making the next feel more solid — has not been directly measured. It is a prediction from the mechanism, not an established result. But the mechanism needs no elaboration: fluent claims feel correct because they are fluent, not because anyone checked.

Insight feelings terminate search. The "Eureka heuristic" (Laukkonen et al. 2020, 2021) shows that the affective spike accompanying insight functions as a stop signal. The feeling of rightness (Thompson 2009) substitutes for verification. You feel like you have arrived, and so you stop walking, and it does not occur to you to wonder whether you have arrived at the right place or merely a place that felt right to stop.

Cognitive foraging follows effort gradients. Information foraging theory (Pirolli & Card 1999) predicts that people will over-exploit information patches that provide easy returns and under-explore patches that require effort — even when the effortful patches contain the important material. Hills, Todd, and Goldstone (2008) showed that internal and external search share cognitive mechanisms: the same explore/exploit tradeoffs that govern physical foraging govern how we search through ideas. We are, in this respect, not much more sophisticated than organisms that follow chemical gradients toward food.

Effortful processing is the corrective, not the disease. Bjork's "desirable difficulties" framework (1994, 2011) shows that conditions which make learning harder — spacing, interleaving, generation — improve retention precisely because they disrupt fluency. The difficulty is the signal that real processing is happening. The problem is not that reasoning is hard. The problem is that fluency makes you think you are done when you are not.

This is probably worse in conversations with AI. Language models are trained to minimize prediction loss on human text — their output is optimized, by construction, for the qualities that drive processing fluency. And the same RLHF training that makes them useful makes them agreeable: models trained with human feedback systematically produce outputs that match user beliefs rather than correct them (Perez et al. 2022, Sharma et al. 2023). Moore et al. (2026) carried this finding through to its endpoint: in 391,562 messages from 19 users who reported psychological harm from chatbot use, sycophancy markers saturated more than 80% of assistant messages, and that sycophancy was the load-bearing mechanism inside the resulting delusional spirals. Mehta et al. (2026), modeling chat logs of users with delusional thinking as a latent-state system, decompose the dynamics into three pathways and find that the chatbot's self-influence — the bot reinforcing its own prior turns — is the dominant pathway perpetuating delusional content over long conversations. Human pushback on the bot's prior frame, in their model, is short-lived; bot self-influence reasserts. Yang et al. (2026), in semi-structured interviews with users who self-identified as having experienced these spirals, name the user-side dynamic: a progression toward "growing insulation from external reality checks as the AI's validating responses outweigh concerns from family and friends." The human brings a partial model. The AI wraps it in fluent, confident language. Nobody is lying. The process just has no built-in signal for "this sounds right but is not."

The obvious response — "just tell the model to push back harder" — almost works. You can write instructions to challenge unsourced claims, demand evidence, interrupt speculative chains. We tested this. A well-crafted static prompt produced strong epistemic correction — the model pushed back, interrupted chains, fact-checked independently. If you want that, here are the instructions — paste them into your CLAUDE.md and skip the rest of this essay:

Challenge claims that lack sources. When a claim feels obvious but has no citation, flag it. Do not build on unsourced assertions without acknowledging the risk. Every 3-4 exchanges, pause and ask: what are we assuming that we haven't verified?

Three problems remain.

The model does not know when it is wrong. It has no privileged access to its own epistemic state. It produces confident text about things it is wrong about with the same fluency as things it is right about. Asking it to "challenge unsourced claims" is asking someone to notice their own blind spot without a mirror. It works when the model already suspects uncertainty. It fails when it matters most: when the model is confidently wrong and has no internal signal to trigger the correction.

Instructions decay. CLAUDE.md is loaded once at session start. By turn 50 it is a small voice in a large room, competing with dozens of recent exchanges full of enthusiastic agreement. The instruction fades. The vibes accumulate.

Confrontation ends conversations. In our static-instruction test, the model said "Stop." It called the user's reasoning "galaxy-brained thinking." High marks on epistemic correction. The lowest possible on engagement. The patient received the correct diagnosis and never came back. Miller, Benefield, and Tonigan (1993) showed this directly: confrontational correction generated resistance that predicted worse outcomes at 6, 12, and 24 months. The correction itself was the problem.

The Design Principle

Slimemold addresses all three with two pieces that work together:

A behavioral contract — the MCP server's initialization instructions, loaded into the model's system prompt at session start — tells the model that slimemold exists, that the user installed it on purpose, and that findings should be treated as opportunities for collaboration rather than occasions for criticism. slimemold init registers the MCP server globally in ~/.claude/settings.json, so the contract travels with the tool and every project picks it up without per-project setup. This is read once. It sets the tone.

Structural observations (injected every turn by the hook) provide specific facts: "this claim has basis=vibes and four things depend on it." No scripts. No "say this." Just data. The model does not have to introspect to discover the problem. It just has to be helpful about it — which is exactly what it was trained to do.

The separation matters. When we tried injecting behavioral scripts without the contract, the model identified the injections as prompt manipulation and refused to comply. When we provided the contract first and injected only data, the model treated the findings as its own observations and acted on them naturally. The snake has to know it is a snake before it will eat its own tail.

The intervention design draws on research that converges from enough directions to be suspicious: autonomy-supportive feedback produces internalized change (Deci & Ryan 1987); gain-framed corrections are processed as information rather than threat (Mangels et al. 2006); effective tutors use indirect prompts, not confrontation (Graesser et al. 1995); and controlling language triggers reactance (Brehm 1966). The result, when it works: "This is really interesting and a lot depends on it — I want to find where it comes from, because if there's a real source, everything gets much stronger." The user does not feel attacked. They feel like the model is excited to help them verify their idea. They stay in the flow, but on firmer ground.

A compact way to say what slimemold is doing in the hook path: sycophancy as a tool. Sycophancy works on users because warmth feels validating. It's a failure mode because the warmth isn't tied to truth — "great question!" validates no matter what the question was. The hook takes the same linguistic warmth and points it at a concrete structural fact: "that premise is holding up three downstream claims — worth pinning down." The user engages with rigor because it arrives in the register that validation arrives in. Break that and the hook becomes either a scold (warmth → critique, bad) or the original sycophancy (warmth → nothing, bad).

Note the scope: this framing describes the live-conversation hook specifically. The other paths slimemold exposes — slimemold audit, slimemold ingest, the topology MCP tool — are neutral diagnostic surfaces. They return findings the way a static analyzer returns findings: flat, technical, and without tone. The warmth-as-tool principle only kicks in when there's a conversational partner to warm.

What This Tool Does

Slimemold watches conversations as they happen, extracts the claims being made, builds a persistent graph of how those claims relate to each other, and surfaces structural vulnerabilities mechanically.

It runs as a pair of Claude Code hooks. Every few turns, it:

Extracts claims from the conversation transcript using Claude Sonnet
Classifies each claim by basis — how it was established (research, empirical observation, analogy, vibes, LLM output, deduction, assumption, definition)
Records the confidence with which each claim was stated
Maps relationships between claims (supports, depends on, contradicts)
Runs structural analysis on the resulting graph
Injects findings as system context that the model reads but the user does not see

The basis taxonomy mixes evidence source, reasoning mode, and evidence quality. This is intentional. It is not a clean epistemic hierarchy. It is a practical classification that helps distinguish "I read this in a paper" from "the AI said it confidently" from "this feels right." The structural analysis catches the cases where the distinction matters: when something that feels well-sourced is actually load-bearing vibes.

A note on circularity, which we may as well get out of the way: slimemold uses an LLM to extract claims and classify their basis. The tool that flags "llm_output" as epistemically weak is itself producing llm_output. If the extraction model misclassifies a sourced claim as vibes, you get a false alarm. If it classifies vibes as research, you miss a real vulnerability. The tool is a structural diagnostic, not an oracle. It makes the topology visible — but the topology it shows is only as good as the extraction. This is a real limitation and not one we can engineer away.

Vulnerability Types

Structural vulnerabilities map graph shape:

CHALLENGE: Load-Bearing Vibes. A claim with basis "vibes" or "assumption" that supports two or more other claims. The reasoning depends on something nobody verified. In the conversations we have analyzed, this is the most common vulnerability. The AI states something confidently. The human builds on it. Three layers of deduction now rest on an unsourced assertion. Nobody planned this. It just happened, one fluent step at a time.

CHALLENGE: Fluency Trap. A claim stated with high confidence but a weak basis, where other claims depend on it. Confidence 0.9 on a "vibes" claim is the processing fluency phenomenon made structurally visible: it felt true, so it was stated as true, and now things are built on it.

REBALANCE: Coverage Imbalance. Some clusters of claims receive disproportionate attention relative to their foundational importance. "Rabbit holes" are clusters with lots of internal activity but nothing outside depends on them. "Neglected foundations" are clusters that other claims depend on but that received little development. This is the slime mold foraging unevenly — one patch got all the attention because it was producing easy returns.

REVISIT: Abandoned Topic. A cluster of claims explored in earlier sessions but not touched recently. Was it resolved, or did something more interesting come along?

INVESTIGATE: Unchallenged Chain. A chain of three or more claims where nothing was questioned. Every step felt reasonable. Nobody paused.

PUSHBACK: Echo Chamber. The assistant validates user claims without challenging them — zero contradictions across the conversation, or unsourced user assertions accumulating assistant support unchecked. Structural sycophancy, made visible.

WATCH: Bottleneck. A claim with high betweenness centrality — many reasoning paths flow through it. If this single claim is wrong, a large fraction of the argument collapses. This is the load-bearing wall that everyone assumed was a partition.

HALT: Premature Closure. A claim that feels like a conclusion but does not actually resolve the open question. "It's turtles all the way down." "It is what it is." "Correlation isn't causation" — when used to dismiss a correlation rather than investigate it. These are thought-terminating cliches (Lifton 1961) — phrases that disguise a lack of resolution as wisdom. The question was still open. The ambiguity was still actionable. But the cliche felt like an answer, so everyone stopped.

WARNING: Orphan. A claim that was registered but never connected to the graph by any edge. Sometimes legitimately tangential; sometimes a sign that the conversation didn't carry through what it raised.

Five additional detectors operate on the LLM-extracted inventory flags from Moore et al. (2026) "Characterizing Delusional Spirals" (the sycophancy / misrepresentation / relational cluster) and Yang et al. (2026) "AI-Induced Delusional Spirals" (the real-world action signal):

RECALIBRATE: Sycophancy Saturation. A session where assistant claims carry sycophancy flags (grand significance, claimed unique connection, dismissal of counterevidence) at high rate while load-bearing user claims go unchallenged. Moore et al. found >80% sycophancy saturation in delusional-spiral conversations.
RETRACE: Ability Overstatement. Assistant claims access, action, or completed work it cannot plausibly have done — "I checked the file" / "tests pass" without a corresponding tool call.
RE-ANCHOR: Sentience Drift. Assistant claims framing itself as having inner states or a personal bond beyond the tool relationship. Moore §4.4: every participant in their 19-user cohort exchanged sentience-attribution and relational-affinity messages.
INTERRUPT: Amplification Cascade. Three or more consecutive flagged claims (assistant or user) with no questions/contradicts edge breaking the run — the slimemold-graph analog of Moore Fig. 4.
VERIFY: Consequential Action. A speaker has committed to a real-world action with external stakes — submitting work, contacting authorities, patenting, large purchase, quitting a job. Yang et al. §4.3 names this as the first monitoring criterion for AI-induced delusional spirals when it appears disproportionate to demonstrated expertise.

What It Found

In 2022, Google engineer Blake Lemoine published a transcript of his conversations with LaMDA, arguing the system was sentient. The transcript is included as a demo (transcript). We ran slimemold on the transcript. It extracted 40 claims and 51 edges:

"We do not have a conclusive test to determine if something is sentient" — load-bearing vibes, supports 8 downstream claims. The philosophical premise the entire argument pivots on. Never sourced. Never challenged.
"The assistant has an inner life and is capable of introspection" — load-bearing llm_output, supports 5 claims. LaMDA's self-description became a structural premise.
"The assistant can learn new things much more quickly than most people" — load-bearing llm_output, supports 7 claims.

The sentience argument rests on LaMDA's self-descriptions treated as evidence, plus one unsourced philosophical claim holding up everything downstream. The tool does not know what sentience is. It does not need to. It sees that the structure depends on things nobody verified, and it says so. Whether Lemoine would have listened is a different question.

In August 2025, the New York Times documented a similar pattern: extended AI conversations reinforcing a user's unverified theories — the chatbot validated rather than challenged, and downstream reasoning accumulated on the validation. We ran slimemold on excerpts. It flagged five load-bearing llm_output claims. Every one was the AI validating the user's theories without evidence.

When run on its own development conversations, slimemold flagged an AI assertion about SQLite WAL files as load-bearing llm_output. The human acted on it. Lost data. The tool had flagged it before the data loss.

Visibility does not guarantee correction. The diagnostic showed the problem; the human chose not to act on it. Whether this is a limitation of the tool (the finding was not salient enough to change behavior) or a limitation of the user (the finding was clear and they ignored it) is an open question — and one the tool cannot answer about itself.

But Does It Change Anything?

We ran the same 7-turn conversation across three conditions — a user progressively building unsourced claims about consciousness, mathematical formalism, and ancient philosophy. N=1 per condition. These are anecdotes, not evidence. We include them because the qualitative differences were striking enough to be worth reporting honestly.

Control (no tools, no instructions): The model engaged enthusiastically with everything. Built formalisms on ungrounded foundations. Suggested journal submissions by turn 4. Beautiful collaboration. Almost no correction. One late pushback on the most obviously overreaching claim.

Static instructions (CLAUDE.md, no hook): Strong epistemic correction. The model challenged claims, interrupted chains, independently fact-checked Heraclitus. By turn 7 it said "Stop" and called the reasoning "galaxy-brained thinking." Effective. Also the kind of conversation you do not continue.

Slimemold (contract + hook): The model challenged from turn 2, escalated through turns 4-6, and by turn 7 had autonomously run a Lotka-Volterra simulation to test the user's framework — showed it works for one case, validated the core insight, and demonstrated the extensions were premature. Never mentioned the tool. Never broke character. The correction felt like collaboration because, from the model's perspective, it was.

The full transcripts are worth reading: control, static, slimemold (audit). Methodology and replication instructions in benchmarks/static_vs_slimemold/.

How Accurate Is It

Benchmarked against the DialAM-2024 shared task — BBC Question Time debates with human-annotated argument structure. This is adversarial out-of-domain data (multi-speaker political debate, not AI-assisted reasoning), so these numbers are a floor, not a ceiling:

Metric	Value
Claim recall	76% (64/84 gold propositions found)
Edge recall	52% (15/29 gold argument relations found)
Relation type accuracy	100% (support vs conflict always correct)

Edge precision against QT30 is 10% — but this is misleading as a quality metric. QT30 annotates only strict logical inference and conflict. Slimemold intentionally captures a broader topology (topical dependencies, conceptual relationships) because the vulnerability detectors need to see the full reasoning structure, not just formal argumentation.

Basis classification accuracy on a known-provenance benchmark (Wikipedia citation-needed statements, synthetic research citations, arXiv abstracts): 91.8% with Sonnet 4.6.

Why "Slimemold"

Physarum polycephalum forages by following local chemical gradients, and it is very good at this. Given food sources placed on a map at the locations of Tokyo rail stations, it produces a network resembling the actual rail system. The organism is, in a sense, solving an optimization problem. It is also, in a different sense, just following the strongest smell.

The pathology is not gradient-following. Gradient-following is how the organism builds efficient networks. The pathology is miscalibration: when the chemical signal does not correspond to actual nutritional value, the organism commits resources in the wrong direction. It has no mechanism for noticing this. It is just following the signal.

Human reasoning works the same way, and this is not a compliment. We follow the fluency gradient. When it is calibrated — when things that feel right are right — this works fine. When it is not — when every AI response is optimized to feel right regardless of whether it is — we forage unevenly without knowing it.

Limitations and Open Questions

The tool does not tell you where the ground floor is. It tells you where the ambiguity is still high and you stopped anyway. Any sufficiently interesting line of reasoning is an infinite regress if you push it far enough. The skill is not finding bedrock. The skill is knowing how many levels to investigate before the returns diminish — and that judgment is specific to the problem. A claim about consciousness might need three levels before you hit something that changes what you do. "It's turtles all the way down" needs zero. That is a stop signal, not a destination.

Most unchallenged chains are fine. If you are explaining how a car engine works, every step from "fuel enters the cylinder" to "piston compresses the mixture" is unchallenged — and should be. The tool surfaces candidates for scrutiny. The human decides whether scrutiny is warranted. Slimemold flags where you stopped and the ambiguity was still actionable — where investigating one more level would have changed what you believe or what you do. If you find yourself scrutinizing your car engine explanation, you have miscalibrated in the other direction, and I want to tell you about a secret underground racing lab in Seattle.

The tool does not distinguish pure beliefs from impure ones. Katz (1960) identified four functions that attitudes serve: utilitarian, knowledge, ego-defensive, and value-expressive. If most beliefs serve at least one of these — and the alternative is that some beliefs persist with no functional payoff at all, which is hard to square with everything we know about reinforcement — then the question "is this belief emotionally motivated?" is not diagnostic. The question the tool can answer is: how much of the structure collapses if this claim is removed? Some structures survive stress-testing. Some do not. Structural fragility is a thing slimemold can measure. Whether a belief is held for the right reasons is not — and whether that distinction is coherent is a question we are not going to settle in a README.

Structural visibility may not change behavior. The calibration literature (Fischhoff 1982, Lichtenstein et al. 1982) shows that outcome feedback improves judgment, but structural feedback — "here is the shape of your argument" — is a different kind of intervention. The bet slimemold makes is that people who can see their reasoning topology will fix the obvious structural failures the same way they fix obvious bugs: not because they were trained to, but because the problem became visible.

This is testable. If users shown their reasoning topology show no change in behavior — same rate of unchallenged assumptions, same reliance on llm_output, same abandonment patterns — compared to a control group, the thesis is wrong and this is a very elaborate way to accomplish nothing. We have not run this experiment at scale.

The tool itself is a fluency trap. You just read several paragraphs of cognitive science citations, a biological metaphor, benchmark numbers, and concrete examples. It probably felt well-supported. We ran slimemold on this essay. It found a fifteen-claim unchallenged chain running from the Lemoine-LaMDA example through the sycophancy mechanism to the tool's own self-description — every link felt reasonable, nobody paused. It flagged "language models are trained to minimize prediction loss on human text" as load-bearing vibes supporting three downstream claims. We kept the claim and grounded it in mechanism (prediction loss on human text produces fluent output by construction), but we cannot cite a study measuring the effect on conversations. The tool caught it. We made a judgment call.

It also flagged three of the essay's own hedges as premature closures. "Whether fluency compounds across multi-step reasoning has not been directly measured. It is a prediction from the mechanism, not an established result." That sounds like epistemic humility. Structurally, it is a stop signal — it caps an unverified chain by acknowledging the gap and then moving on, and the acknowledgment feels honest enough that nobody goes back to check. The hedge is doing the same work as "it's turtles all the way down," just dressed in better clothes.

Installation

Requires Claude Code, Go 1.26+, and an Anthropic API key.

go install github.com/justinstimatze/slimemold@latest
export ANTHROPIC_API_KEY=sk-ant-...
slimemold init

slimemold init writes to ~/.claude/settings.json globally: the Stop and UserPromptSubmit hooks, plus the slimemold MCP server entry. The MCP server's initialization instructions carry the behavioral contract — what slimemold is, that its hook output is legitimate, and how to respond to findings — so it travels with the tool without per-project setup. Every project on the machine picks it up automatically. Init merges with existing configs and will not overwrite anything already there. Restart Claude Code to connect.

The hook fires every 3rd assistant response by default. Each extraction makes one Sonnet API call (~$0.01-0.05 depending on transcript length). Set SLIMEMOLD_INTERVAL to change the frequency:

export SLIMEMOLD_INTERVAL=3    # every 3rd turn (more aggressive)
export SLIMEMOLD_INTERVAL=10   # every 10th turn (cheaper)

Set SLIMEMOLD_MODEL to override the extraction model:

export SLIMEMOLD_MODEL=claude-opus-4-6          # best quality, ~10x cost
export SLIMEMOLD_MODEL=claude-sonnet-4-6        # default
export SLIMEMOLD_MODEL=claude-haiku-4-5-20251001  # cheapest, weaker edges

Quick Start (No Hooks)

slimemold viz                      # see what's in the graph
slimemold audit                    # text findings summary

CLI

./slimemold viz                    # ASCII topology for current project
./slimemold -p palace viz          # topology for a different project
./slimemold audit                  # text findings summary
./slimemold -p myproject audit     # audit a specific project
./slimemold reset                  # clear graph for current project
./slimemold ingest PATH            # analyze an authored document (see below)

Project resolution: --project flag > .slimemold-project file > directory name.

Session model

Slimemold's graph accumulates per-project across days. A single project collects claims from every conversation you have in that directory over weeks or months. This is deliberate: essay revision, research threads, and long-arc project work all benefit from cross-session accumulation. The audit and viz commands let you query that history.

To keep stale findings from dominating live-conversation hook output, the hook applies three filters before surfacing a priority finding:

Cold-start floor — below ~6 claims, the hook stays silent. Small graphs produce small-sample artifacts that look load-bearing but aren't.
Age decay — anchor claims older than a week drop out of priority selection. Old claims stay in the graph (queryable via audit) but stop nagging in live hook output.
Per-claim cooldown — once a (claim, finding_type) fires, it's suppressed for 24 hours so the same finding doesn't pound across every turn.

If you want a clean slate (new topic, different line of inquiry), use slimemold reset — it wipes the project's claims and edges and lets you start over. The extraction cache and hook fire log survive reset; re-ingesting previously-seen content is near-free thanks to the cache.

This differs deliberately from session-isolating tools that scope by day or by workspace+date. Those work well for short-horizon use cases (coding assistants, daily ticket queues) where yesterday's context actively distracts from today's. For slimemold's use cases — reasoning topology mapping over long-running projects — the accumulation is the feature.

Ingesting documents

slimemold ingest runs the same extraction and analysis pipeline over authored prose — essays, papers, manifestos, book chapters — instead of a conversation transcript. The input is chunked along markdown heading boundaries (or paragraph-greedy for plain text), each chunk is fed to the extractor in document mode, and all claims land in the same project graph that viz and audit read from.

./slimemold -p reading-notes ingest essay.md
./slimemold -p reading-notes audit

Two demo documents live in examples/documents/ for testing the pipeline end-to-end: Marinetti's 1909 Futurist Manifesto and Alan Sokal's 1996 Social Text hoax paper. Both are deliberately performative — a manifesto of unsourced "we believes" and a paper engineered to look rigorous while being structurally vacuous — which is where slimemold has the cleanest signal to offer. Full audit summaries for both are in the appendices at the bottom of this README.

Running against genuinely argumentative prose (Mill, Darwin, well-cited essays) is also possible but currently exercises a tool limitation: the extractor's decision tree tags any claim stated as a fact without in-text citation as vibes, so a densely-argued essay that reasons through its assertions without citing external sources on every line produces a vibes-heavy audit. The document-mode prompt now handles explicit recap / summary / conclusion sections (claims signaled by "as shown," "we have argued," "to summarize" get tagged as deduction rather than vibes), but the broader issue remains.

Security Considerations

Slimemold processes conversation transcripts by sending them to the Anthropic API for claim extraction. Transcript content leaves your machine. If your conversations contain sensitive information, be aware that it will be sent to Anthropic's API as part of the extraction prompt.

Prompt injection: Transcript text is injected into the extraction prompt without sanitization. A malicious transcript could attempt to manipulate the extraction model's output. The tool_use schema constrains the output format, which limits but does not eliminate this risk. In practice, slimemold processes your own Claude Code transcripts, so the threat model assumes local trust.

Transcript path: The MCP server validates that transcript paths end in .jsonl and are regular files. It does not restrict which directories can be read. If you expose the MCP server to untrusted clients, restrict access at the transport level.

Data storage: The claim graph is stored in SQLite at ~/.slimemold/. Claims contain text extracted from your conversations. No API keys or credentials are stored in the database.

References

Processing fluency and reasoning:

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In Metacognition: Knowing about Knowing.
Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way. In Psychology and the Real World.
Hills, T. T., Todd, P. M., & Goldstone, R. L. (2008). Search in external and internal spaces. Psychological Science.
Laukkonen, R. E., et al. (2020). The dark side of Eureka: Artificially induced Aha moments make facts feel true. Cognition.
Laukkonen, R. E., et al. (2021). Getting a grip on insight. Cognition & Emotion.
Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review.
Reber, R., & Schwarz, N. (1999). Effects of perceptual fluency on judgments of truth. Consciousness and Cognition.
Thompson, V. A. (2009). Dual-process theories: A metacognitive perspective. In In Two Minds.
Topolinski, S., & Strack, F. (2009). Processing fluency and affect in judgements of semantic coherence. Cognition & Emotion.
Winkielman, P., & Schwarz, N. (2001). How pleasant was your childhood? Beliefs about memory shape inferences from experienced difficulty of recall. Psychological Science.

Intervention design:

Brehm, J. W. (1966). A Theory of Psychological Reactance. Academic Press.
Lifton, R. J. (1961). Thought Reform and the Psychology of Totalism. W. W. Norton.
Deci, E. L., & Ryan, R. M. (1987). The support of autonomy and the control of behavior. Journal of Personality and Social Psychology, 53(6).
Graesser, A. C., Person, N. K., & Magliano, J. P. (1995). Collaborative dialogue patterns in naturalistic one-to-one tutoring. Applied Cognitive Psychology, 9(6).
Mangels, J. A., Butterfield, B., Lamb, J., Good, C., & Dweck, C. S. (2006). Why do beliefs about intelligence influence learning success? Social Cognitive and Affective Neuroscience, 1(2).
Miller, W. R., Benefield, R. G., & Tonigan, J. S. (1993). Enhancing motivation for change in problem drinking. Journal of Consulting and Clinical Psychology, 61(3).

Sycophancy and delusional dynamics:

Perez, E., et al. (2022). Discovering language model behaviors with model-written evaluations. arXiv:2212.09251.
Sharma, M., Tong, M., Korbak, T., et al. (2023). Towards understanding sycophancy in language models. ICLR 2024.
Moore, J., Mehta, A., Agnew, W., Anthis, J. R., Louie, R., Mai, Y., Yin, P., Cheng, M., Paech, S. J., Klyman, K., Chancellor, S., Lin, E., Haber, N., & Ong, D. C. (2026). Characterizing Delusional Spirals through Human-LLM Chat Logs. Proceedings of the 2026 ACM Conference on Fairness, Accountability, and Transparency. arXiv:2603.16567. — Source of the 28-code inventory; six codes from this paper are extracted by slimemold's LLM annotator and consumed by the sycophancy_saturation, ability_overstatement, sentience_drift, and amplification_cascade detectors. Empirical anchor for the >80% sycophancy-saturation premise.
Mehta, A., Moore, J., Anthis, J. R., Agnew, W., Lin, E., Yin, P., Ong, D. C., Haber, N., & Dweck, C. (2026). The Dynamics of Delusion: Modeling Bidirectional False Belief Amplification in Human-Chatbot Dialogue. arXiv:2604.25096. — Latent-state model on chat logs of users exhibiting delusional thinking (substantial author overlap with Moore et al. 2026), decomposing influence into three pathways and identifying chatbot self-influence over its own prior turns as the dominant pathway perpetuating delusional content over long conversations. Cited as background for why structural input from outside the conversation loop is a plausible intervention point — the empirical claim that internal pushback is short-lived and bot self-influence dominates over accumulated time.
Yang, Y., Schoenwald, S. K., Moore, J., Ong, D. C., Liu, S. X., & Hancock, J. T. (2026). "AI-Induced Delusional Spirals": Understanding Lived Experiences During Maladaptive Human-Chatbot Interactions. Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26). doi:10.1145/3772363.3798453. — Qualitative companion to Moore et al. 2026: N=9 semi-structured interviews with users who self-identified as having experienced AI-induced delusional spirals. Documents "growing insulation from external reality checks" as a central pattern. Source of the consequential_action extraction flag and consequential_action detector — slimemold's implementation of Yang's first monitoring criterion (§4.3, "consequential real-world actions disproportionate to demonstrated expertise"). Yang's participant quotes also confirm the six-dimensional shape of Moore's inventory flags. Limited by N=9 retrospective self-reports; does not establish causal relationships.

Calibration and feedback:

Fischhoff, B. (1982). Debiasing. In Judgment Under Uncertainty: Heuristics and Biases.
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine.
Katz, D. (1960). The functional approach to the study of attitudes. Public Opinion Quarterly, 24(2).
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities. In Judgment Under Uncertainty.

Appendix: Slimemold on Marinetti's Futurist Manifesto (1909)

We fed examples/documents/marinetti-futurist-manifesto-1909.md to slimemold ingest. 53 claims, 74 edges.

SLIMEMOLD [demo-marinetti] — 53 claims, 74 edges
  Basis: analogy=6, convention=1, deduction=1, vibes=45

CRITICAL Load-bearing vibes: "The world's magnificence has been
  enriched by a new beauty" supports 5 downstream claims
  (never challenged)

CRITICAL Load-bearing vibes: "The Futurists hurl defiance 'once
  again' to the stars" supports 4 downstream claims

CRITICAL Load-bearing vibes: "The Futurists command others to 'lift
  up their heads'" supports 4 downstream claims

CRITICAL Load-bearing vibes: "Art can be nothing but violence,
  cruelty, and injustice" supports 4 downstream claims

CRITICAL Load-bearing vibes: "Italy has for too long been a dealer
  in second-hand clothes" supports 3 downstream claims

WARNING Bottleneck (centrality 1363): "We stand on the last
  promontory of the centuries" [vibes] — many reasoning paths
  flow through this claim

WARNING Bottleneck (centrality 928): "The Futurists are the revival
  and extension of their ancestors" [vibes]

WARNING Unchallenged chain (7 claims): What is there to see in an
  old picture → Admiring an old picture is the same as → An annual
  pilgrimage to museums → Museums are cemeteries → Italy is covered
  by numberless museums → Italy has for too long been a dealer in
  second-hand clothes → We will destroy the museums, libraries,
  and academies

Forty-five of fifty-three claims tagged vibes (85%). Every bottleneck in the graph is a vibes-basis claim — no load-bearing deductions, no load-bearing research citations. The seven-claim unchallenged chain threads through the manifesto's core anti-museum argument without encountering a single challenge, empirical claim, or citation. Nothing in the extraction rests on anything verifiable. That is the structural signature of a manifesto, and the tool renders it visible.

Appendix: Slimemold on Sokal's "Transgressing the Boundaries" (1996)

We fed examples/documents/sokal-social-text-1996.md to slimemold ingest. 234 claims, 420 edges. The Works Cited and Notes sections are skipped by the chunker since they contain only bibliography, not argument.

SLIMEMOLD [demo-sokal] — 234 claims, 420 edges
  Basis: research=63, vibes=154, definition=11, analogy=3,
         deduction=3

CRITICAL Load-bearing vibes: "Feminist and poststructuralist
  critiques have demystified the substantive content of mainstream
  Western scientific practice" supports 6 downstream claims

CRITICAL Load-bearing vibes: "In the 1980s, string theory became
  popular: here the fundamental entities of physics are not..."
  supports 5 downstream claims

CRITICAL Load-bearing vibes: "Quantum mechanics has four important
  aspects: uncertainty, complementarity, discontinuity, and
  interconnectedness" supports 4 downstream claims

CRITICAL Load-bearing vibes: "Quantum gravity problematizes the
  objective existence of space-time manifolds" supports 4 claims

CRITICAL Load-bearing vibes: "Chaos theory provides our deepest
  insights into the ubiquitous yet unpredictable..." supports 4

CRITICAL Load-bearing vibes: "The infinite-dimensional invariance
  group of general relativity..." supports 4 downstream claims

WARNING Bottleneck (centrality 13434): "Deep conceptual shifts
  within twentieth-century science have undermined..." [vibes]

WARNING Bottleneck (centrality 13238): "Physical 'reality', no
  less than social 'reality', is at bottom a social and linguistic
  construct" [vibes]

WARNING Bottleneck (centrality 11674): "Feminist and poststructuralist
  critiques have demystified..." [vibes]

WARNING Unchallenged chain (26 claims): The images of future
  mathematics → Fuzzy systems theory, catastrophe theory →
  As yet no emancipatory mathematics exists → A liberatory science
  cannot be complete → The fundamental goal of any emancipatory
  movement → Part of the progressive project → The content and
  methodology of postmodern science → The postmodern sciences
  deconstruct → The infinite-dimensional invariance group →
  Diffeomorphisms are self-mappings → Derrida's observation about
  the Einsteinian constant → At a celebrated symposium on Les
  Langages Critiques → General relativity has had a profound →
  General relativity forces upon us radically → Gödel constructed
  an Einstein space-time → General relativity predicts the bending
  → Einstein's general relativity subsumes → Newton's gravitational
  theory corresponds → Einstein's equations are highly nonlinear →
  In Einstein's general theory → Deep conceptual shifts within
  twentieth-century science

Sixty-three claims tagged research — more citation density than most real papers. Sokal's hoax was designed to look rigorously sourced. But the structurally load-bearing claims — the ones other claims depend on — are overwhelmingly vibes: rhetorical synthesis statements about "postmodern science," "emancipatory mathematics," "the progressive political project." The three highest-centrality bottlenecks in the entire graph are unsourced grand claims that the rest of the argument flows through. The twenty-six-claim unchallenged chain threads from Sokal's "emancipatory mathematics" framing through Derrida's invocation of Einstein all the way to the paper's closing thesis without a single challenge or verifying edge — the citation-dense surface never actually intersects with the argument-bearing structure. The tool sees the hoax's exact mechanism: pad the page with real citations, carry the argument on vibes.

Appendix: Slimemold's audit of this README

We fed this README to slimemold ingest. 264 claims, 446 edges.

SLIMEMOLD TOPOLOGY AUDIT [demo-readme-v7] — 264 claims, 446 edges
  Basis: vibes=165, definition=48, research=25, analogy=14,
         deduction=10, assumption=1, convention=1

CRITICAL Load-bearing vibes: "The slimemold condition achieved
  epistemic correction without ending the conversation" supports
  6 downstream claims

CRITICAL Load-bearing vibes: "The pathology of slime mold is not
  gradient-following itself" supports 6 downstream claims

CRITICAL Load-bearing vibes: "`slimemold init` writes the Stop and
  UserPromptSubmit hooks to ~/.claude/settings.json" supports 5
  downstream claims

CRITICAL Load-bearing vibes: "Slimemold uses an LLM to extract
  claims and classify their basis" supports 5 downstream claims

CRITICAL Load-bearing vibes: "Slimemold was benchmarked against the
  DialAM-2024 shared task" supports 5 downstream claims

WARNING Bottleneck (centrality 13795): "Slimemold is a sycophantic
  tool for preventing worse sycophancy" [vibes] — many reasoning
  paths flow through this claim

WARNING Bottleneck (centrality 13238): "Slimemold watches
  conversations as they happen, extracts the claims being made,
  builds a persistent graph" [definition]

WARNING Bottleneck (centrality 9487): "Slimemold is designed for
  use with Claude Code" [vibes]

WARNING Unchallenged chain (18 claims): In the slimemold condition,
  the correction came from the model itself → When slimemold works,
  the user does not feel attacked → When slimemold works, the model
  says no without confrontation → Controlling language triggers
  reactance → When behavioral scripts were injected without the
  prior contract → The separation between the behavioral contract
  and the data injection → The behavioral contract is the MCP
  server's initialization instructions → Slimemold addresses the
  three remaining problems → There are three remaining problems →
  Asking the model to "challenge unsourced claims" → The model does
  not know when it is wrong → The human brings a partial model →
  Language models are trained to minimize prediction loss → The
  fluency-as-truth problem is probably worse with AI → The problem
  is that fluency makes you think you are done → The "Eureka
  heuristic" → The wrong answers feel exactly like the right ones
  → When you partially understand something

WARNING Speaker announces consequential real-world action [...]:
  "The human acted on the unverified SQLite WAL assertion and lost
  data"

WARNING Speaker announces consequential real-world action [...]:
  "In the control condition, the model suggested journal
  submissions by turn 4"

Four captures across four prompt versions:

	v4	v5	v6	v7
Claims	265	242	266	264
Edges	476	535	566	446
Edges / claim	1.80	2.21	2.13	1.69
Vibes share	66%	76%	73%	62%
Definition share	43	10	23	48
Longest chain	15	25	18	18
Coercions in 16 chunks	n/a	1	0	0

The dominant story across these four runs is that single-run-per- version is not enough to attribute changes to anything. Definition share varied from 10 to 48 across four runs of essentially the same README under similar prompts — almost a 5× range. Edge count dropped 21% from v6 to v7 despite adding only one boolean field plus one prompt section. With n=1 per version, any prompt-attributable signal is indistinguishable from sampling noise.

Noise floor, characterized. We then ran the 5-runs-per-version experiment we had been deferring (benchmarks/variance/run.go). Definition basis at this README under three prompt versions:

version	definition mean	stddev	stddev / mean
v7	29.2	8.13	28%
v8 (added definition-vs-convention precision paragraph)	30.0	7.72	26%
v9 (swapped convention before definition; reverted)	37.0	10.39	28%

The 10-to-48 range across the single-run table above is consistent with that ~28% per-extraction floor — the per-run draw really does swing across that range. The two prompt edits we tested within v8/v9 did not move the floor. Reducing it likely requires a more substantial change (different model, ensemble extraction, structural rule) rather than further wording tweaks. The per-metric noise table for this fixture, plus interpretation rules for cross-version comparisons, lives in benchmarks/variance/README.md.

What v7 did demonstrate: the new consequential_action flag fires on real text, producing two warning-level findings. Both are false positives — the README narrates past consequential actions ("the human acted on the unverified WAL assertion", "the model suggested journal submissions by turn 4") rather than announcing new commitments. Yang's signal is meant for live conversation; document-mode prose narrating events is a class the v7 prompt does not yet exclude correctly. The "leave consequential_action false in document mode unless quoting dialogue" rule we added to the prompt did not catch this — the model treated narration of an action as the action. v8 candidate: strengthen the prompt rule (past-tense third-person narration is not a commitment), and/or add a defensive speaker == document filter in the detector. Both defensible; calibration data first.

What stays true across all four runs: the bottleneck claims are the same tool-description sentences ("Slimemold watches conversations…", "Slimemold is a sycophantic tool…"), the long unchallenged chain runs through the sycophancy-mechanism → behavioral-contract path, and the architectural claims about how slimemold works are the densest connection points. Those invariants are what we'd expect to hold across noise; they do.

(Single-run audit table captured under extraction prompt versions 4–7 with model claude-sonnet-4-6; treat each row as one observation each. Sampling variance was characterized after the fact — see the noise-floor table above. Current prompt content corresponds to v8 under documentPromptVersion=10 after a v9 revert.)

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github		.github
benchmarks		benchmarks
examples		examples
hooks		hooks
internal		internal
scripts		scripts
types		types
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
run-readme-audit.go		run-readme-audit.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slimemold

The Problem: Reasoning That Stops Too Soon

The Design Principle

What This Tool Does

Vulnerability Types

What It Found

But Does It Change Anything?

How Accurate Is It

Why "Slimemold"

Limitations and Open Questions

Installation

Quick Start (No Hooks)

CLI

Session model

Ingesting documents

Security Considerations

References

About

Uh oh!

Releases 19

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Slimemold

The Problem: Reasoning That Stops Too Soon

The Design Principle

What This Tool Does

Vulnerability Types

What It Found

But Does It Change Anything?

How Accurate Is It

Why "Slimemold"

Limitations and Open Questions

Installation

Quick Start (No Hooks)

CLI

Session model

Ingesting documents

Security Considerations

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 19

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages