An OpenClaw plugin for PII detection and custom entity redaction, powered by DataFog.
FogClaw uses a dual-engine approach: battle-tested regex patterns for structured PII (emails, SSNs, credit cards, etc.) and GLiNER via ONNX for zero-shot named entity recognition — letting you redact not just PII but any custom terms, expressions, or entity types you define.
- Three-layer scanning — inbound prompts, tool results, and outbound messages are all scanned for PII before they cross trust boundaries
- Automatic guardrail — intercepts messages before they reach the LLM via OpenClaw's
before_agent_starthook - Tool result scanning — redacts PII in file reads, API responses, and web fetches before they enter the session transcript (
tool_result_persist) - Outbound message scanning — last-chance gate that catches PII in agent replies before delivery to external channels (
message_sending) - On-demand tools —
fogclaw_scan,fogclaw_preview, andfogclaw_redact - Dual detection engine — regex for structured PII (<1ms), GLiNER for zero-shot NER (~50-200ms)
- Custom entity types — define any entity label (e.g., "project codename", "competitor name") and GLiNER detects them with zero training
- Configurable actions — per-entity-type behavior:
redact,block, orwarn - Per-entity confidence tuning — tighten or relax detection confidence by label
- Policy allowlist — whitelist exact strings or regex patterns to skip enforcement on known-safe values
- Policy preview — run a dry-run simulation before changing runtime policy
- Multiple redaction strategies —
token,mask, orhash - Audit trail summary logging — optional structured action summaries without logging raw entity content
- Graceful degradation — falls back to regex-only mode if GLiNER fails to load
# From the OpenClaw CLI
openclaw plugins install @datafog/fogclaw
# Or manually
git clone https://github.com/DataFog/fogclaw.git ~/.openclaw/extensions/fogclaw
cd ~/.openclaw/extensions/fogclaw
npm install
npm run buildFogClaw automatically downloads the GLiNER ONNX model on first run if it is not already cached locally, then reuses it for all later starts.
What happens on first scan:
- Tokenizers are downloaded (if needed).
- The first available ONNX file from Hugging Face is downloaded to the plugin's local model cache:
.../node_modules/@xenova/transformers/.cache/<model-repo>/onnx/<selected-model>.onnx- (This download can take a moment depending on network and selected model size.)
- GLiNER starts using local files, so later runs stay fast and offline-friendly.
If the download cannot be performed (network/firewall/auth), FogClaw safely falls back to regex-only mode and continues to protect common structured PII.
If your network requires Hugging Face authentication, export HF_TOKEN or HF_ACCESS_TOKEN before starting OpenClaw so model files can download.
- Copy the example config:
cp fogclaw.config.example.json fogclaw.config.json- Edit
fogclaw.config.jsonto your needs:
{
"enabled": true,
"guardrail_mode": "redact",
"redactStrategy": "token",
"model": "onnx-community/gliner_large-v2.1",
"confidence_threshold": 0.5,
"entityConfidenceThresholds": {
"PERSON": 0.6,
"ORGANIZATION": 0.7
},
"custom_entities": ["project codename", "competitor name"],
"entityActions": {
"SSN": "block",
"CREDIT_CARD": "block",
"EMAIL": "redact",
"PHONE": "redact",
"PERSON": "warn"
},
"allowlist": {
"values": ["noreply@example.com"],
"patterns": ["^internal-"],
"entities": {
"PERSON": ["john doe"]
}
},
"auditEnabled": true
}- Enable the plugin in your OpenClaw config and restart.
These commands are the minimum evidence set for PR review:
npm test
npm run build
npm run test:plugin-smoke
npm pkg get openclaw
npm run build
node - <<'NODE'
import plugin from './dist/index.js';
const result = plugin.register ? 'ok' : 'missing-register';
console.log(result, plugin.id, plugin.name);
NODEExpected output:
- All tests pass.
npm run buildexits with0and writesdist/index.js.npm run test:plugin-smokepasses and confirms hook/tool contracts.npm pkg get openclawshows{"extensions":["./dist/index.js"]}.- The inline node check prints
ok fogclaw FogClaw.
Incoming message
|
v
+-----------+
| Regex Pass | emails, SSNs, phones, credit cards, IPs, dates, zips
| (<1ms) | confidence: 1.0
+-----+-----+
|
v
+-----------+
| GLiNER | persons, orgs, locations + your custom entities
| (ONNX) | confidence: 0.0-1.0
+-----------+
|
v
+-----------+
| Merge & | deduplicate overlapping spans, prefer higher confidence
| Normalize |
+-----------+
|
v
Apply action per entity type (redact / block / warn)
FogClaw hooks into three points in the OpenClaw message lifecycle. Each hook uses the detection engine best suited to its runtime constraints:
| Hook | Direction | Engine | Latency | Async | Entity Coverage |
|---|---|---|---|---|---|
before_agent_start |
Inbound (user prompt) | Regex + GLiNER | ~50-200ms | Yes | Full — structured PII + names, orgs, custom entities |
tool_result_persist |
Internal (tool results) | Regex only | <1ms | No (sync) | Structured PII — emails, SSNs, phones, credit cards, IPs |
message_sending |
Outbound (agent reply) | Regex + GLiNER | ~50-200ms | Yes | Full — structured PII + names, orgs, custom entities |
Why regex-only for tool results? OpenClaw's tool_result_persist hook requires synchronous handlers — async returns are rejected. GLiNER inference runs a synchronous ONNX native call that blocks the event loop for 100-500ms per invocation, which would degrade gateway responsiveness (delayed heartbeats, WebSocket pings, HTTP responses). Regex covers the high-confidence structured patterns most common in tool output (credentials in file reads, contact info in API responses). Person names and organization names are caught on the async inbound and outbound paths, providing defense-in-depth without hot-path latency.
User prompt ──► before_agent_start (regex + GLiNER)
│
▼
Agent + LLM
│
┌─────────┼─────────┐
▼ ▼ ▼
Tool call Tool call Tool call
│ │ │
▼ ▼ ▼
tool_result_persist (regex only, sync)
│
▼
Agent reply
│
▼
message_sending (regex + GLiNER)
│
▼
External channel
(Telegram, Slack, etc.)
| Type | Examples |
|---|---|
EMAIL |
john@example.com, user+tag@example.co.uk |
PHONE |
555-123-4567, (555) 123-4567, +44 20 7946 0958 |
SSN |
123-45-6789 |
CREDIT_CARD |
Visa, Mastercard, Amex (with/without separators) |
IP_ADDRESS |
192.168.1.1, 10.0.0.1 |
DATE |
01/15/1990, 2020-01-15, January 15, 2000 |
ZIP_CODE |
10001, 10001-1234 |
Built-in labels: person, organization, location, address, date of birth, medical record number, account number, passport number
Plus any labels you add via custom_entities in the config.
| Strategy | Input | Output |
|---|---|---|
token |
Contact john@example.com |
Contact [EMAIL_1] |
mask |
Contact john@example.com |
Contact **************** |
hash |
Contact john@example.com |
Contact [EMAIL_a1b2c3d4e5f6] |
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean |
true |
Enable/disable the plugin |
guardrail_mode |
string |
"redact" |
Default action: "redact", "block", or "warn" |
redactStrategy |
string |
"token" |
How to redact: "token", "mask", or "hash" |
model |
string |
"onnx-community/gliner_large-v2.1" |
HuggingFace model path for GLiNER (or a local .onnx path for advanced setups). |
confidence_threshold |
number |
0.5 |
Minimum confidence for GLiNER detections (0-1) |
entityConfidenceThresholds |
object |
{} |
Per-label confidence overrides, e.g. { "PERSON": 0.7, "ORGANIZATION": 0.85 } |
custom_entities |
string[] |
[] |
Custom entity labels for zero-shot detection |
entityActions |
object |
{} |
Per-entity-type action overrides |
allowlist |
object |
{} |
Exception rules to skip enforcement via exact values or regex patterns |
auditEnabled |
boolean |
true |
Emit structured audit logs for guardrail decisions |
Scan text for PII and custom entities. Returns detected entities with types, positions, and confidence scores.
Parameters:
text(required) — text to scancustom_labels(optional) — additional entity labels for zero-shot detection
Preview what the guardrail would do for a message.
Parameters:
text(required) — text to simulatestrategy(optional) —"token","mask", or"hash"(defaults to config)custom_labels(optional) — additional entity labels for zero-shot detection
Response:
entities: detected entities and metadatatotalEntities: total entities foundactionPlan: counts and labels grouped byblocked,warned,redactedredactedText: message with only redacted entities applied
Scan and redact PII/custom entities from text. Returns sanitized text with entities replaced.
Parameters:
text(required) — text to scan and redactstrategy(optional) —"token","mask", or"hash"(defaults to config)custom_labels(optional) — additional entity labels for zero-shot detection
FogClaw's core can also be used outside of OpenClaw:
import { Scanner, redact, loadConfig, DEFAULT_CONFIG } from "@datafog/fogclaw";
const scanner = new Scanner(DEFAULT_CONFIG);
await scanner.initialize();
// Scan for entities
const result = await scanner.scan("Contact john@example.com or call 555-123-4567");
console.log(result.entities);
// [
// { text: "john@example.com", label: "EMAIL", start: 8, end: 24, confidence: 1, source: "regex" },
// { text: "555-123-4567", label: "PHONE", start: 33, end: 45, confidence: 1, source: "regex" }
// ]
// Redact
const redacted = redact(result.text, result.entities, "token");
console.log(redacted.redacted_text);
// "Contact [EMAIL_1] or call [PHONE_1]"git clone https://github.com/DataFog/fogclaw.git
cd fogclaw
npm install
npm test # run tests
npm run build # compile TypeScript
npm run lint # type-check without emitting- Keep
api.loggeroutput free of raw sensitive values. - Use allowlists and
auditEnabledaccording to your governance requirements. - Consider
blockactions for high-risk entity types in regulated environments.
MIT