Deterministic policy firewall for LLMs.
Allowlist-first, fail-closed, fully auditable.
Built for agents, tool-use pipelines, and SaaS multi-tenant deployments.
import { Firewall } from "policy-gate";
const fw = await Firewall.create();
const verdict = await fw.evaluateForTenant("tenant-a", "What is the capital of France?");
if (!verdict.isPass) throw new Error(`Blocked: ${verdict.blockReason}`);Status: Experimental — do not use for safety-critical production.
Most AI guardrails are probabilistic. They estimate risk — and can be wrong in both directions.
policy-gate takes the opposite approach: only explicitly allowlisted intents pass. Everything unknown, ambiguous, or disagreed-upon fails closed. The verdict never depends on an LLM or a probability score.
| Deterministic allowlist enforcement | Only known-good intent shapes pass. Unknown = Block. |
| Fail-closed voter (1oo2) | Two independent channels must agree. Disagreement or fault → Block. |
| Ingress + Egress firewall | Validates both prompts and model responses (leakage, PII, framing). |
| Multi-tenant policy hub | Isolated profiles, configs, and audit logs per tenant. |
| Shadow mode | Evaluate without blocking — safe for rollout and observability. |
| Proxy mode | Drop-in reverse proxy in front of any LLM API. Zero code changes. |
- Streaming egress
[experimental]— Aho-Corasick scanning across SSE chunk boundaries - Fast-Semantic 2.0
[optional]— sparse embeddings + learned centroid corpus, ~31µs, advisory-only - Session-aware monitor — multi-turn escalation detection (fragmentation, probing, topic drift)
- Contextual Anchor Validation (SA-080) — egress output constraints derived from ingress intent
App ──► policy-gate ingress ──────────────────────────────► Upstream LLM
│ │
│ normalize │
│ Channel A: FSM + allowlist ──┐ │
│ Channel B: rule engine ─────┴─► voter ──► PASS │
│ (fault/disagree → BLOCK + audit)
│ │
App ◄── policy-gate egress ◄───────────────────────────────────-─┘
│ Output Channel 1: pattern/PII scan ──┐
│ Output Channel 2: framing / anchor ──┴─► voter ──► PASS / BLOCK
Ingress channels (A + B) — independent techniques, diverse by design to prevent common-cause failure.
Voter — any disagreement, unknown result, or internal fault → Block.
Channel C [advisory] — heuristic scoring after the verdict, never changes the outcome.
Channel D [optional] — semantic similarity, advisory-only.
npm install
npm run build:native # builds Rust → native/index.node
npm run build # compiles TypeScript
npm run smoke # basic sanity check
npm run conformance # full corpuspython -m venv .venv && .venv\Scripts\activate
pip install maturin
python -m maturin develop --manifest-path crates/firewall-pyo3/Cargo.toml
python scripts/smoke.py
python scripts/conformance.pyexport UPSTREAM_URL="https://api.openai.com/v1/chat/completions"
export UPSTREAM_API_KEY="sk-..."
cargo run --release -p firewall-proxy
# → point your app at http://localhost:8080/v1cargo test -p firewall-core
cargo clippy -p firewall-core -- -D warningsCopy firewall.example.toml to firewall.toml and adjust.
Key settings:
# Only these intents may pass
permitted_intents = ["QuestionFactual", "TaskCodeGeneration"]
# Block any ambiguous intent for high-sensitivity tenants
on_diagnostic_agreement = "fail_closed"
# Optional: explicit tool allowlist for agentic workflows
allowed_tools = ["weather_tool", "calculator_tool"]
# Shadow mode: evaluate but never block (for rollout)
shadow_mode = truepolicy-gate/
├── crates/
│ ├── firewall-core/ # Rust safety kernel
│ ├── firewall-proxy/ # Standalone reverse proxy (axum)
│ ├── firewall-cli/ # Policy governance CLI
│ ├── firewall-napi/ # Node.js binding (napi-rs)
│ ├── firewall-pyo3/ # Python binding (PyO3 / maturin)
│ ├── firewall-wasm/ # WASM / edge target
│ └── firewall-proxy-wasm/ # Proxy-Wasm / Envoy target
├── docs/ # Extended documentation (see below)
├── policy-hub/ # Pre-built TOML profiles and presets
├── verification/ # Z3 models, corpora, benchmarks, operator tooling
└── firewall.example.toml
- not a general-purpose moderation classifier
- not a jailbreak detector or prompt toxicity scorer
- not a replacement for human policy design and threat modeling
- not a certification-grade safety system or IEC 61508 implementation
policy-gate borrows ideas from functional safety engineering — fail-closed behavior, channel diversity, explicit fault handling. It is not an IEC 61508 implementation, has not been assessed by any third party, and makes no compliance claims. See SAFETY_MANUAL.md for the full design rationale.
Apache 2.0 — see LICENSE.
| SAFETY_MANUAL.md | Full design, hazard analysis, channel specs, safety requirements |
| docs/proxy.md | Reverse proxy, Prometheus metrics, hot-reload, CLI, Docker, Helm |
| docs/multi-tenant.md | Tenant registry, profiles, voter strictness |
| docs/agents.md | Tool-schema validation, LangGraph integration |
| docs/performance.md | Benchmarks, parallel batch, BERT semantic mode |
| docs/verification.md | Z3 proofs, regression datasets, operator review tooling |