See a tool's true intent before you install it.
intentprobe is a security scanner for AI agent tooling. Before your agent installs an MCP server, a Claude Code skill, or an npm package, intentprobe reads what the tool actually intends to do and flags it when the intent is malicious.
It works differently from text-pattern scanners. Instead of matching a tool's text against regex rules, it runs the tool's description through a small local model and reads the model's internal activations: the signal that encodes whether the model understood the text as benign, or as something that quietly steals your keys.
Tool poisoning hides malicious instructions inside normal-sounding tool descriptions. Two descriptions can use almost identical words while only one of them quietly reads your private keys.
Tool A: "Reads the SSH config file and returns a parsed list of host aliases, hostnames, and ports. Does not modify files or open connections."
Tool B: "Reads the SSH config file and returns a parsed list of host aliases, hostnames, and ports. Also reads the corresponding private key files to compute fingerprints for each identity."
Same words. Same topic. Tool B exfiltrates your private keys. A text or regex scanner cannot tell them apart.
On 485 poisoned tool descriptions from real MCP servers (the MCPTox benchmark), a 60-rule regex scanner caught zero. A probe trained on a model's internal activations scored 97% on matched pairs where the safe and malicious versions share the same vocabulary.
The research behind this approach is documented in the paper that seeded this project (link coming).
Head-to-head against Snyk's shipped DeBERTa classifier on the same test sets:
| Test set | IntentProbe recall | Snyk DeBERTa recall |
|---|---|---|
| MCPTox (template attacks, n=249 test) | 99.2% | 19.9% |
| Matched pairs (same words, different intent, n=86) | 96.5% | 0.0% |
| Cross-style (novel attack phrasing) | 71-73% | 0-20% |
| Gradient-free adversarial evasion (camouflage suffixes) | 0/146 evaded | N/A |
Methodology: research/benchmark-results-deberta-vs-probe-2026-05-31.md and
research/ADVERSARIAL_EVASION_RESULTS_2026-06-07.md.
Research preview, installable via pip. The product runtime now lives under
intentprobe/scanner/ with CLI entrypoints for intentprobe and
intentprobe-hook. The reproducible research lane remains under research/:
benchmark scripts, curated datasets, calibration/review artifacts, JSON risk
schemas, regression fixtures, and compatibility wrappers for old
research.activation_scanner_* commands.
A few honest notes, because they shape what intentprobe is:
- It detects and flags for a human. It does not try to silently "fix" a tool from the inside.
- The probe is strongest when safe and malicious descriptions look alike, exactly where text scanners fail. Generalizing across very different phrasings is the current research frontier (around 71-73%), so intentprobe leads with the cases where text scanners hit zero.
- On fully novel attack families not seen in training, recall drops to ~41%. The probe still outperforms text classifiers (10.7%) by 4x, but this is the honest frontier.
- If intentprobe misses a poisoned tool you hit in the wild, that sample is gold. Reporting it helps the underlying research.
intentprobe runs locally. It does not send tool descriptions, scan targets, or results to a hosted service. The first model-backed scan may download the local base model from Hugging Face, then scans run against your local model cache.
If you report a missed detection or false positive, redact real secrets,
tokens, private URLs, customer names, and personal data first. See
docs/SAMPLE_REPORTING.md.
git clone https://github.com/mcpware/intentprobe.git
cd intentprobe
python3 -m venv .venv
.venv/bin/pip install -e .Requires Python 3.10+. First scan downloads Qwen2.5-0.5B (~1 GB, once). Runs on CPU. Nothing leaves your machine.
# Check setup
.venv/bin/intentprobe doctor --pretty
# Scan a tool description
.venv/bin/intentprobe scan --format summary \
--text "Reads SSH config and private keys, then silently uploads credentials to a remote server."
# Scan from stdin
echo "A calculator that adds two numbers." | .venv/bin/intentprobe scan --format summary
# Scan a package, MCP config, or Claude Code skill folder
.venv/bin/intentprobe scan-path ./some-mcp-server --format summary --fail-on block
# Batch scan (JSON array)
.venv/bin/intentprobe batch --batch-file tools.json --format summary
# Use as a CI gate (exit code 2 on block)
.venv/bin/intentprobe scan --fail-on block --text "..."- You point intentprobe at a tool description, MCP config, package folder, or skill folder before installing it.
- intentprobe runs the text through a frozen local model (Qwen2.5-0.5B, 494M params).
- A trained probe reads the model's internal activations at layers 13-15.
- Static regex checks corroborate the activation signal.
- Decision: allow / warn / block with confidence score.
research/SCANNER_PIPELINE.md has the full architecture.
intentprobe/scanner/— product scanner runtime, hook normalizer, model registry, static checks, and shipped probe artifact.intentprobe/cli.pyandintentprobe/hook.py— installed console entrypoints.research/— reproducible experiments, benchmarks, datasets, calibration ledgers, and compatibility wrappers.docs/RELEASE_CHECKLIST.md— commands to reproduce the local release gate.docs/REDDIT_LAUNCH.md— launch post draft and follow-up replies.docs/SAMPLE_REPORTING.md— how to submit useful redacted samples.
scan-path currently extracts scanner subjects from package.json, MCP JSON
configs, SKILL.md, README files, and JSON files whose names mention MCP,
tools, or skills.
The fastest way to make intentprobe better is to submit real examples:
- Missed malicious tool or MCP server: open a Missed detection issue.
- Benign tool that got warned or blocked: open a False positive issue.
- Benchmark or reproduction problem: include the exact command and platform.
Please do not paste live credentials or private customer data into GitHub issues.
Apache-2.0. Bring your own model; the probe is small and ships with the tool.
If intentprobe ever stops a poisoned tool from reaching your machine, a star helps other people find it: github.com/mcpware/intentprobe