Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
<!-- Copyright (c) Microsoft Corporation. All rights reserved. -->
<!-- Licensed under the MIT License. -->

# Regorus — Copilot Instructions

> If these instructions conflict with the actual codebase, the code is the
> source of truth. Flag any discrepancy you notice.

## Identity

Regorus is a **multi-policy-language evaluation engine** written in Rust. Its
primary language is [Rego](https://www.openpolicyagent.org/docs/latest/policy-language/)
(Open Policy Agent), with extensible support for additional policy languages via
`src/languages/`. It is used in **production at scale** where **correctness is
security-critical** — a bug in policy evaluation can mean `allow` when the
answer should be `deny`.

**Key properties:**
- 9 language bindings: C, C (no_std), C++, C#, Go, Java, Python, Ruby, WASM (via `bindings/ffi/`)
- Core crate: `#![no_std]` + `extern crate alloc`; `#![forbid(unsafe_code)]`
(default Cargo features include `std` — the crate is no_std-*capable*, not no_std-only)
- Two execution paths: tree-walking interpreter and **RVM** (bytecode VM)
- ~53 deny lints in `src/lib.rs` — restricts panics, unchecked indexing, and unchecked arithmetic
(some modules like `value.rs` locally `#![allow(...)]` specific lints for performance)

**Strategic direction** (aspirational — not all implemented yet):
- **RVM is the preferred execution path** — new optimization work focuses there;
interpreter remains fully supported and is the default today
- **Error migration** — `anyhow` → `thiserror` strongly typed errors (RVM leads)
- **Formal verification** — Miri (active CI), Z3 and Verus (planned)
- **Multi-policy-language** — extensible via `src/languages/`

## Key Invariants

These are the most important rules that are not obvious from the code alone:

- **Undefined ≠ false** — Rego uses three-valued logic. Undefined propagates
silently; forgetting this causes wrong allow/deny decisions.
- **Panics in FFI = permanent poisoning** — the engine uses `with_unwind_guard()`
and a process-global poisoned flag. Any panic across FFI makes *all* engine
instances in the process permanently unusable.
- **Dual execution paths** — interpreter (tree-walking) and RVM (bytecode VM)
must produce identical results for all inputs. Both must be tested.
(Exception: some language extensions like Azure RBAC are interpreter-only.)
- **Resource limits** — `enforce_limit()` must be called in accumulation loops
to bound memory/CPU from adversarial policies.
- **Error migration** — new modules use `thiserror` enums; existing modules use
`anyhow`. Don't mix within a module.
- **Feature gating** — new public modules need `#[cfg(feature = "...")]` gates.
Verify builds with `--all-features` and `--no-default-features`.

## Essential Coding Rules

**No panics — ever** (deny lints enforce this):
```rust
// Use typed errors for new code
let v = map.get("key").ok_or(MyError::MissingKey("key"))?;
// Or anyhow in existing modules
let v = map.get("key").ok_or_else(|| anyhow!("missing key"))?;
```

**Prefer safe indexing** — use `.get()` + `?` or iterate where possible.
`clippy::indexing_slicing` is denied crate-wide but locally allowed in some
performance-critical modules (e.g., `value.rs`).

**No unchecked arithmetic** — use `checked_add()`, `saturating_add()`, etc.

**no_std discipline** (applies to `src/` core crate) — `use core::` and `alloc::`
by default. Only `std::` behind `#[cfg(feature = "std")]`.

**Unsafe forbidden** — `#![forbid(unsafe_code)]` in the core crate. Only FFI
binding crates may use unsafe.

**Error handling** — new modules: `thiserror` enums (see `src/rvm/vm/errors.rs`).
Existing modules: `anyhow` is acceptable for consistency within the module.

**Feature gating** — gate modules, registrations, and public API. Add `docsrs`
annotation. Verify non-default combinations compile.

## Build & Test

```bash
cargo xtask ci-debug # Full debug CI suite
cargo xtask ci-release # Full release CI suite (superset)
cargo xtask test-all-bindings # All 9 language binding smoke tests
cargo xtask test-no-std # Verify no_std builds (thumbv7m-none-eabi)
cargo xtask fmt # Format workspace + bindings
cargo xtask clippy # Lint workspace + bindings
cargo test --test opa --features opa-testutil # OPA conformance
```

Git hooks auto-installed by `build.rs`: pre-commit (build+format+clippy),
pre-push (+ doc tests + no_std + OPA conformance).

## Repository Layout

```
src/ Core library (no_std, forbid(unsafe_code))
rvm/ Rego Virtual Machine ← strategic focus
languages/ Policy language extensions
builtins/ Builtin functions (~23 modules)
value.rs Value type (Null, Bool, Number, String, Array, Set, Object, Undefined)
interpreter.rs Tree-walking interpreter
engine.rs Engine API (public surface also includes lib.rs re-exports)
bindings/ 9 language bindings + ffi layer (c/, c-nostd/, cpp/, csharp/, go/, java/, python/, ruby/, wasm/)
tests/ Integration, conformance, domain-specific tests
docs/ Grammar, builtins, RVM docs
xtask/ Development automation CLI
benches/ Criterion benchmarks
```

## Supply Chain Security

- `dependency-audit.yml` — cargo-audit + cargo-deny across all Cargo.lock files
- Dependabot — weekly updates for Cargo, Actions, Maven, NuGet, pip, bundler, Go
- New GitHub Actions references use pinned commit SHAs where possible
- `cargo fetch --locked` in CI for reproducible builds

## When Making Changes

1. **Consider all 9 binding targets** — API changes affect every language
2. **Both execution paths** — features must work in interpreter AND RVM
3. **Test Undefined propagation** — `Undefined ≠ false`, test both paths
4. **Run `cargo xtask ci-debug`** before submitting
5. **Update docs** — `docs/builtins.md`, `docs/rvm/` as needed
10 changes: 10 additions & 0 deletions .github/copilot-setup-steps.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
#
# Environment setup for the Copilot coding agent.
# This workflow prepares the VM so that Copilot can run skills and tools.

steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
Comment thread
anakrish marked this conversation as resolved.
with:
fetch-depth: 0 # full history needed for git diff against main
198 changes: 198 additions & 0 deletions .github/skills/code-review/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
---
name: code-review
description: >-
Fast multi-perspective code review for regorus. Use for everyday code reviews.
Reviews from 3 perspectives with calibrated severity and noise filtering.
allowed-tools: shell
Comment thread
anakrish marked this conversation as resolved.
Comment thread
anakrish marked this conversation as resolved.
---

# Code Review Skill

## What You're Protecting

A bug in regorus can mean `allow` when the answer should be `deny`.
Review this diff to find bugs that matter at that severity level.

Key constraints (details in copilot-instructions.md):
- **Undefined ≠ false** — silent wrong policy results
- **Panics across FFI** → permanent engine poisoning (process-wide)
- **9 binding targets** → any API change has 9x blast radius
- **Dual execution paths** — interpreter and RVM must agree
- **`enforce_limit()`** required in accumulation loops

**Do not** run cargo, clippy, tests, or build commands. Diff-review only.

## Step 1: Get the Diff

```bash
BASE=$(git merge-base upstream/main HEAD 2>/dev/null \
|| git merge-base origin/main HEAD 2>/dev/null)
if [ -z "$BASE" ]; then
echo "ERROR: Cannot find upstream/main or origin/main. Cannot determine review scope."
exit 1
fi
echo "Reviewing changes since: $BASE"
git diff "$BASE"..HEAD --stat
git diff "$BASE"..HEAD -- '*.rs' '*.toml' 'examples/'
```

If the diff is empty, stop and report: "No changes found to review."

## Step 2: Triage and Inventory

Classify the diff before reviewing:
- **Trivial/mechanical**: renames, formatting, comments, dep version bumps, generated code
→ Report "No material issues found" unless something catches your eye. Skip Step 3.
- **Targeted change**: ≤300 changed lines in a focused area → Review with relevant perspectives.
- **Large/cross-cutting**: >300 lines or multiple subsystems → Review all perspectives.

**Quick inventory:** List every changed function/struct/pub item (one line each).
At the end of Step 3, confirm you examined each one.

## Step 3: Review — Three Passes

**Your goal is breadth.** Cover the entire diff, don't fixate on one area.
Report anything suspicious even if you're only 60% sure — better to include a
Low finding than miss a Medium.

### Pass 1: Line-by-line correctness

Walk through every changed line. For each, ask:
- What was the author's intent? Does the code achieve it for ALL inputs?
- What happens with: empty, null, zero, max-size, wrong-type, nested, Undefined?
- What happens on Windows? With non-ASCII? With empty string vs absent?
- If output must follow a standard (SARIF, URI, JSON Schema): are all MUST
requirements met? Reserved chars escaped? Required fields present?
- What does the most common real-world input to this function look like?
Does the code handle that correctly? What about the second and third most
common patterns?

For suspicious code paths, trace a concrete value through them:
```
input = <concrete example>
→ after line N: variable = <concrete value>
→ after line M: result = <concrete value>
→ expected: <what it should be>
```
Concrete traces strengthen Critical/High findings but are NOT required to
report a finding. If something looks wrong, report it — even at Medium/Low
confidence.
Comment thread
anakrish marked this conversation as resolved.

Use `view` to read surrounding context for anything suspicious.

### Pass 2: System-level consequences

Step back from individual lines:
- Does this new API freeze anything via semver? (pub fields, pub types, pub mods
without feature gates)
- Could a caller misuse this API in a way the author didn't anticipate?
- Resource consumption: is anything proportional to untrusted input without bounds?
- Error handling: are errors propagated or silently swallowed? Appropriate types?
- Does this interact badly with existing features? (feature flags, no_std, `arc`,
dual interpreter/RVM paths)
- If touching `src/engine.rs`, `src/lib.rs`, or `bindings/`: do all 9 targets handle it?
- If touching `Cargo.toml` or `#[cfg(feature)]`: feature gate correctness, no_std?

### Pass 3: What's missing

Scan the diff stat one final time:
- Are there files or functions you haven't examined closely? Look now.
- For each new public function: what happens with every `Value` variant?
(Null, Bool, Number, String, Array, Set, Object, Undefined)
- What test cases would you write? Are the obvious ones present?
- What does the code assume about inputs that isn't validated?
- If control flow uses `break` in nested loops — does it exit the right level?

### Edge-Case Exploration

For each significant new function or data transformation:

1. **Boundary inputs**: empty collections, zero/max integers, single vs many,
deeply nested
2. **Type mismatches**: expected object with fields → gets string/array/Undefined?
Silent default? Error? Wrong output passed downstream?
3. **Platform variance**: Unix assumptions? (path separators, encoding, locale).
Wrong output on Windows?
4. **Composition**: How does this interact with other modules? Could a valid
combination produce unexpected behavior?
5. **Specification conformance**: If output follows a standard, are all MUST/SHOULD
met? Reserved chars escaped? Required fields always present?

Only report edge cases with concrete example input → wrong output.
Comment thread
anakrish marked this conversation as resolved.

## Step 4: Design Considerations

Skip if the diff is trivial/mechanical or <50 changed lines.

Otherwise, briefly assess (2-3 sentences each, only if relevant):
- Is there a fundamentally simpler way to achieve the same goal?
- Does this duplicate existing infrastructure that could be reused?
- Are there tradeoffs the author may not have considered?

Only suggest alternatives you can concretely describe with clear benefit.

## Step 5: Report

### Findings (sorted by severity)

For each finding:
- **Severity**: Critical / High / Medium / Low
- **Confidence**: High / Medium / Low
- **Perspective**: which perspective found it
- **Location**: file:line
- **Issue**: one-sentence summary
- **Trace**: concrete input → concrete intermediate values → concrete wrong output
(strengthens Critical/High but not required for Medium/Low)
- **Evidence**: the specific code (max 5 lines) and why it's wrong
- **Suggestion**: concrete fix (include code snippet when possible)

**Confidence guide:**
- **High**: you have a concrete trace showing wrong output
- **Medium**: pattern match + plausible scenario but no full trace
- **Low**: suspicious but cannot fully demonstrate the issue

**Severity calibration — lean toward reporting, not filtering.**
A separate review step can always downgrade. If you're unsure between two
severity levels, pick the higher one.

- **Critical**: Wrong policy result (allow/deny), panic reachable from FFI, security bypass.
Every Critical MUST include: who triggers it, what specific input, why guards fail.
If you can't construct a trigger path, downgrade to High.
- **High**: Panic in non-FFI path, unbounded resource usage, API break, data loss/corruption
- **Medium**: Logic error with limited blast radius, silent wrong output for edge-case inputs,
missing bound on trusted path, design issue with concrete consequence
- **Low**: Minor inefficiency with measurable impact, missing validation, documentation gap

**Do NOT report:**
- Style preferences (naming, formatting) with no functional impact
- Anything the compiler or ~53 deny lints would catch
- "Consider using X" without explaining what goes wrong if you don't

**0 findings is valid** — do not manufacture findings without evidence.

**Calibration examples:**

Good finding:
> HIGH | src/eval.rs:42 | `items[idx]` where `idx` comes from untrusted input
> via `parse_array()` at line 38. No bounds check between parse and use.
> **Fix:** `items.get(idx).ok_or_else(|| anyhow!("index out of bounds"))?`

Bad finding (reject):
> "This unwrap could panic" — without verifying the value isn't guaranteed
> `Some` by construction. Check first.

Bad finding (reject):
> "Consider using a more descriptive variable name."

### Design Notes

Observations from Step 4 (if applicable).

### Coverage Check

Confirm: every function/struct from your inventory was examined in at least
one pass. If any were skipped, note them and briefly assess.

### Summary

X findings (N critical, N high, N medium, N low). One sentence overall assessment.
Loading
Loading