microsoft · anakrish · May 4, 2026 · May 1, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,125 @@
+<!-- Copyright (c) Microsoft Corporation. All rights reserved. -->
+<!-- Licensed under the MIT License. -->
+
+# Regorus — Copilot Instructions
+
+> If these instructions conflict with the actual codebase, the code is the
+> source of truth. Flag any discrepancy you notice.
+
+## Identity
+
+Regorus is a **multi-policy-language evaluation engine** written in Rust. Its
+primary language is [Rego](https://www.openpolicyagent.org/docs/latest/policy-language/)
+(Open Policy Agent), with extensible support for additional policy languages via
+`src/languages/`. It is used in **production at scale** where **correctness is
+security-critical** — a bug in policy evaluation can mean `allow` when the
+answer should be `deny`.
+
+**Key properties:**
+- 9 language bindings: C, C (no_std), C++, C#, Go, Java, Python, Ruby, WASM (via `bindings/ffi/`)
+- Core crate: `#![no_std]` + `extern crate alloc`; `#![forbid(unsafe_code)]`
+  (default Cargo features include `std` — the crate is no_std-*capable*, not no_std-only)
+- Two execution paths: tree-walking interpreter and **RVM** (bytecode VM)
+- ~53 deny lints in `src/lib.rs` — restricts panics, unchecked indexing, and unchecked arithmetic
+  (some modules like `value.rs` locally `#![allow(...)]` specific lints for performance)
+
+**Strategic direction** (aspirational — not all implemented yet):
+- **RVM is the preferred execution path** — new optimization work focuses there;
+  interpreter remains fully supported and is the default today
+- **Error migration** — `anyhow` → `thiserror` strongly typed errors (RVM leads)
+- **Formal verification** — Miri (active CI), Z3 and Verus (planned)
+- **Multi-policy-language** — extensible via `src/languages/`
+
+## Key Invariants
+
+These are the most important rules that are not obvious from the code alone:
+
+- **Undefined ≠ false** — Rego uses three-valued logic. Undefined propagates
+  silently; forgetting this causes wrong allow/deny decisions.
+- **Panics in FFI = permanent poisoning** — the engine uses `with_unwind_guard()`
+  and a process-global poisoned flag. Any panic across FFI makes *all* engine
+  instances in the process permanently unusable.
+- **Dual execution paths** — interpreter (tree-walking) and RVM (bytecode VM)
+  must produce identical results for all inputs. Both must be tested.
+  (Exception: some language extensions like Azure RBAC are interpreter-only.)
+- **Resource limits** — `enforce_limit()` must be called in accumulation loops
+  to bound memory/CPU from adversarial policies.
+- **Error migration** — new modules use `thiserror` enums; existing modules use
+  `anyhow`. Don't mix within a module.
+- **Feature gating** — new public modules need `#[cfg(feature = "...")]` gates.
+  Verify builds with `--all-features` and `--no-default-features`.
+
+## Essential Coding Rules
+
+**No panics — ever** (deny lints enforce this):
+```rust
+// Use typed errors for new code
+let v = map.get("key").ok_or(MyError::MissingKey("key"))?;
+// Or anyhow in existing modules
+let v = map.get("key").ok_or_else(|| anyhow!("missing key"))?;
+```
+
+**Prefer safe indexing** — use `.get()` + `?` or iterate where possible.
+`clippy::indexing_slicing` is denied crate-wide but locally allowed in some
+performance-critical modules (e.g., `value.rs`).
+
+**No unchecked arithmetic** — use `checked_add()`, `saturating_add()`, etc.
+
+**no_std discipline** (applies to `src/` core crate) — `use core::` and `alloc::`
+by default. Only `std::` behind `#[cfg(feature = "std")]`.
+
+**Unsafe forbidden** — `#![forbid(unsafe_code)]` in the core crate. Only FFI
+binding crates may use unsafe.
+
+**Error handling** — new modules: `thiserror` enums (see `src/rvm/vm/errors.rs`).
+Existing modules: `anyhow` is acceptable for consistency within the module.
+
+**Feature gating** — gate modules, registrations, and public API. Add `docsrs`
+annotation. Verify non-default combinations compile.
+
+## Build & Test
+
+```bash
+cargo xtask ci-debug          # Full debug CI suite
+cargo xtask ci-release        # Full release CI suite (superset)
+cargo xtask test-all-bindings # All 9 language binding smoke tests
+cargo xtask test-no-std       # Verify no_std builds (thumbv7m-none-eabi)
+cargo xtask fmt               # Format workspace + bindings
+cargo xtask clippy            # Lint workspace + bindings
+cargo test --test opa --features opa-testutil  # OPA conformance
+```
+
+Git hooks auto-installed by `build.rs`: pre-commit (build+format+clippy),
+pre-push (+ doc tests + no_std + OPA conformance).
+
+## Repository Layout
+
+```
+src/                    Core library (no_std, forbid(unsafe_code))
+  rvm/                  Rego Virtual Machine ← strategic focus
+  languages/            Policy language extensions
+  builtins/             Builtin functions (~23 modules)
+  value.rs              Value type (Null, Bool, Number, String, Array, Set, Object, Undefined)
+  interpreter.rs        Tree-walking interpreter
+  engine.rs             Engine API (public surface also includes lib.rs re-exports)
+bindings/               9 language bindings + ffi layer (c/, c-nostd/, cpp/, csharp/, go/, java/, python/, ruby/, wasm/)
+tests/                  Integration, conformance, domain-specific tests
+docs/                   Grammar, builtins, RVM docs
+xtask/                  Development automation CLI
+benches/                Criterion benchmarks
+```
+
+## Supply Chain Security
+
+- `dependency-audit.yml` — cargo-audit + cargo-deny across all Cargo.lock files
+- Dependabot — weekly updates for Cargo, Actions, Maven, NuGet, pip, bundler, Go
+- New GitHub Actions references use pinned commit SHAs where possible
+- `cargo fetch --locked` in CI for reproducible builds
+
+## When Making Changes
+
+1. **Consider all 9 binding targets** — API changes affect every language
+2. **Both execution paths** — features must work in interpreter AND RVM
+3. **Test Undefined propagation** — `Undefined ≠ false`, test both paths
+4. **Run `cargo xtask ci-debug`** before submitting
+5. **Update docs** — `docs/builtins.md`, `docs/rvm/` as needed
diff --git a/.github/copilot-setup-steps.yml b/.github/copilot-setup-steps.yml
@@ -0,0 +1,10 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+#
+# Environment setup for the Copilot coding agent.
+# This workflow prepares the VM so that Copilot can run skills and tools.
+
+steps:
+  - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+    with:
+      fetch-depth: 0 # full history needed for git diff against main
diff --git a/.github/skills/code-review/SKILL.md b/.github/skills/code-review/SKILL.md
@@ -0,0 +1,198 @@
+---
+name: code-review
+description: >-
+  Fast multi-perspective code review for regorus. Use for everyday code reviews.
+  Reviews from 3 perspectives with calibrated severity and noise filtering.
+allowed-tools: shell
+---
+
+# Code Review Skill
+
+## What You're Protecting
+
+A bug in regorus can mean `allow` when the answer should be `deny`.
+Review this diff to find bugs that matter at that severity level.
+
+Key constraints (details in copilot-instructions.md):
+- **Undefined ≠ false** — silent wrong policy results
+- **Panics across FFI** → permanent engine poisoning (process-wide)
+- **9 binding targets** → any API change has 9x blast radius
+- **Dual execution paths** — interpreter and RVM must agree
+- **`enforce_limit()`** required in accumulation loops
+
+**Do not** run cargo, clippy, tests, or build commands. Diff-review only.
+
+## Step 1: Get the Diff
+
+```bash
+BASE=$(git merge-base upstream/main HEAD 2>/dev/null \
+    || git merge-base origin/main HEAD 2>/dev/null)
+if [ -z "$BASE" ]; then
+  echo "ERROR: Cannot find upstream/main or origin/main. Cannot determine review scope."
+  exit 1
+fi
+echo "Reviewing changes since: $BASE"
+git diff "$BASE"..HEAD --stat
+git diff "$BASE"..HEAD -- '*.rs' '*.toml' 'examples/'
+```
+
+If the diff is empty, stop and report: "No changes found to review."
+
+## Step 2: Triage and Inventory
+
+Classify the diff before reviewing:
+- **Trivial/mechanical**: renames, formatting, comments, dep version bumps, generated code
+  → Report "No material issues found" unless something catches your eye. Skip Step 3.
+- **Targeted change**: ≤300 changed lines in a focused area → Review with relevant perspectives.
+- **Large/cross-cutting**: >300 lines or multiple subsystems → Review all perspectives.
+
+**Quick inventory:** List every changed function/struct/pub item (one line each).
+At the end of Step 3, confirm you examined each one.
+
+## Step 3: Review — Three Passes
+
+**Your goal is breadth.** Cover the entire diff, don't fixate on one area.
+Report anything suspicious even if you're only 60% sure — better to include a
+Low finding than miss a Medium.
+
+### Pass 1: Line-by-line correctness
+
+Walk through every changed line. For each, ask:
+- What was the author's intent? Does the code achieve it for ALL inputs?
+- What happens with: empty, null, zero, max-size, wrong-type, nested, Undefined?
+- What happens on Windows? With non-ASCII? With empty string vs absent?
+- If output must follow a standard (SARIF, URI, JSON Schema): are all MUST
+  requirements met? Reserved chars escaped? Required fields present?
+- What does the most common real-world input to this function look like?
+  Does the code handle that correctly? What about the second and third most
+  common patterns?
+
+For suspicious code paths, trace a concrete value through them:
+```
+input = <concrete example>
+→ after line N: variable = <concrete value>
+→ after line M: result = <concrete value>
+→ expected: <what it should be>
+```
+Concrete traces strengthen Critical/High findings but are NOT required to
+report a finding. If something looks wrong, report it — even at Medium/Low
+confidence.
+
+Use `view` to read surrounding context for anything suspicious.
+
+### Pass 2: System-level consequences
+
+Step back from individual lines:
+- Does this new API freeze anything via semver? (pub fields, pub types, pub mods
+  without feature gates)
+- Could a caller misuse this API in a way the author didn't anticipate?
+- Resource consumption: is anything proportional to untrusted input without bounds?
+- Error handling: are errors propagated or silently swallowed? Appropriate types?
+- Does this interact badly with existing features? (feature flags, no_std, `arc`,
+  dual interpreter/RVM paths)
+- If touching `src/engine.rs`, `src/lib.rs`, or `bindings/`: do all 9 targets handle it?
+- If touching `Cargo.toml` or `#[cfg(feature)]`: feature gate correctness, no_std?
+
+### Pass 3: What's missing
+
+Scan the diff stat one final time:
+- Are there files or functions you haven't examined closely? Look now.
+- For each new public function: what happens with every `Value` variant?
+  (Null, Bool, Number, String, Array, Set, Object, Undefined)
+- What test cases would you write? Are the obvious ones present?
+- What does the code assume about inputs that isn't validated?
+- If control flow uses `break` in nested loops — does it exit the right level?
+
+### Edge-Case Exploration
+
+For each significant new function or data transformation:
+
+1. **Boundary inputs**: empty collections, zero/max integers, single vs many,
+   deeply nested
+2. **Type mismatches**: expected object with fields → gets string/array/Undefined?
+   Silent default? Error? Wrong output passed downstream?
+3. **Platform variance**: Unix assumptions? (path separators, encoding, locale).
+   Wrong output on Windows?
+4. **Composition**: How does this interact with other modules? Could a valid
+   combination produce unexpected behavior?
+5. **Specification conformance**: If output follows a standard, are all MUST/SHOULD
+   met? Reserved chars escaped? Required fields always present?
+
+Only report edge cases with concrete example input → wrong output.
+
+## Step 4: Design Considerations
+
+Skip if the diff is trivial/mechanical or <50 changed lines.
+
+Otherwise, briefly assess (2-3 sentences each, only if relevant):
+- Is there a fundamentally simpler way to achieve the same goal?
+- Does this duplicate existing infrastructure that could be reused?
+- Are there tradeoffs the author may not have considered?
+
+Only suggest alternatives you can concretely describe with clear benefit.
+
+## Step 5: Report
+
+### Findings (sorted by severity)
+
+For each finding:
+- **Severity**: Critical / High / Medium / Low
+- **Confidence**: High / Medium / Low
+- **Perspective**: which perspective found it
+- **Location**: file:line
+- **Issue**: one-sentence summary
+- **Trace**: concrete input → concrete intermediate values → concrete wrong output
+  (strengthens Critical/High but not required for Medium/Low)
+- **Evidence**: the specific code (max 5 lines) and why it's wrong
+- **Suggestion**: concrete fix (include code snippet when possible)
+
+**Confidence guide:**
+- **High**: you have a concrete trace showing wrong output
+- **Medium**: pattern match + plausible scenario but no full trace
+- **Low**: suspicious but cannot fully demonstrate the issue
+
+**Severity calibration — lean toward reporting, not filtering.**
+A separate review step can always downgrade. If you're unsure between two
+severity levels, pick the higher one.
+
+- **Critical**: Wrong policy result (allow/deny), panic reachable from FFI, security bypass.
+  Every Critical MUST include: who triggers it, what specific input, why guards fail.
+  If you can't construct a trigger path, downgrade to High.
+- **High**: Panic in non-FFI path, unbounded resource usage, API break, data loss/corruption
+- **Medium**: Logic error with limited blast radius, silent wrong output for edge-case inputs,
+  missing bound on trusted path, design issue with concrete consequence
+- **Low**: Minor inefficiency with measurable impact, missing validation, documentation gap
+
+**Do NOT report:**
+- Style preferences (naming, formatting) with no functional impact
+- Anything the compiler or ~53 deny lints would catch
+- "Consider using X" without explaining what goes wrong if you don't
+
+**0 findings is valid** — do not manufacture findings without evidence.
+
+**Calibration examples:**
+
+Good finding:
+> HIGH | src/eval.rs:42 | `items[idx]` where `idx` comes from untrusted input
+> via `parse_array()` at line 38. No bounds check between parse and use.
+> **Fix:** `items.get(idx).ok_or_else(|| anyhow!("index out of bounds"))?`
+
+Bad finding (reject):
+> "This unwrap could panic" — without verifying the value isn't guaranteed
+> `Some` by construction. Check first.
+
+Bad finding (reject):
+> "Consider using a more descriptive variable name."
+
+### Design Notes
+
+Observations from Step 4 (if applicable).
+
+### Coverage Check
+
+Confirm: every function/struct from your inventory was examined in at least
+one pass. If any were skipped, note them and briefly assess.
+
+### Summary
+
+X findings (N critical, N high, N medium, N low). One sentence overall assessment.