-
Notifications
You must be signed in to change notification settings - Fork 61
feat(copilot): add multi-agent code review skills #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| <!-- Copyright (c) Microsoft Corporation. All rights reserved. --> | ||
| <!-- Licensed under the MIT License. --> | ||
|
|
||
| # Regorus — Copilot Instructions | ||
|
|
||
| > If these instructions conflict with the actual codebase, the code is the | ||
| > source of truth. Flag any discrepancy you notice. | ||
|
|
||
| ## Identity | ||
|
|
||
| Regorus is a **multi-policy-language evaluation engine** written in Rust. Its | ||
| primary language is [Rego](https://www.openpolicyagent.org/docs/latest/policy-language/) | ||
| (Open Policy Agent), with extensible support for additional policy languages via | ||
| `src/languages/`. It is used in **production at scale** where **correctness is | ||
| security-critical** — a bug in policy evaluation can mean `allow` when the | ||
| answer should be `deny`. | ||
|
|
||
| **Key properties:** | ||
| - 9 language bindings: C, C (no_std), C++, C#, Go, Java, Python, Ruby, WASM (via `bindings/ffi/`) | ||
| - Core crate: `#![no_std]` + `extern crate alloc`; `#![forbid(unsafe_code)]` | ||
| (default Cargo features include `std` — the crate is no_std-*capable*, not no_std-only) | ||
| - Two execution paths: tree-walking interpreter and **RVM** (bytecode VM) | ||
| - ~53 deny lints in `src/lib.rs` — restricts panics, unchecked indexing, and unchecked arithmetic | ||
| (some modules like `value.rs` locally `#![allow(...)]` specific lints for performance) | ||
|
|
||
| **Strategic direction** (aspirational — not all implemented yet): | ||
| - **RVM is the preferred execution path** — new optimization work focuses there; | ||
| interpreter remains fully supported and is the default today | ||
| - **Error migration** — `anyhow` → `thiserror` strongly typed errors (RVM leads) | ||
| - **Formal verification** — Miri (active CI), Z3 and Verus (planned) | ||
| - **Multi-policy-language** — extensible via `src/languages/` | ||
|
|
||
| ## Key Invariants | ||
|
|
||
| These are the most important rules that are not obvious from the code alone: | ||
|
|
||
| - **Undefined ≠ false** — Rego uses three-valued logic. Undefined propagates | ||
| silently; forgetting this causes wrong allow/deny decisions. | ||
| - **Panics in FFI = permanent poisoning** — the engine uses `with_unwind_guard()` | ||
| and a process-global poisoned flag. Any panic across FFI makes *all* engine | ||
| instances in the process permanently unusable. | ||
| - **Dual execution paths** — interpreter (tree-walking) and RVM (bytecode VM) | ||
| must produce identical results for all inputs. Both must be tested. | ||
| (Exception: some language extensions like Azure RBAC are interpreter-only.) | ||
| - **Resource limits** — `enforce_limit()` must be called in accumulation loops | ||
| to bound memory/CPU from adversarial policies. | ||
| - **Error migration** — new modules use `thiserror` enums; existing modules use | ||
| `anyhow`. Don't mix within a module. | ||
| - **Feature gating** — new public modules need `#[cfg(feature = "...")]` gates. | ||
| Verify builds with `--all-features` and `--no-default-features`. | ||
|
|
||
| ## Essential Coding Rules | ||
|
|
||
| **No panics — ever** (deny lints enforce this): | ||
| ```rust | ||
| // Use typed errors for new code | ||
| let v = map.get("key").ok_or(MyError::MissingKey("key"))?; | ||
| // Or anyhow in existing modules | ||
| let v = map.get("key").ok_or_else(|| anyhow!("missing key"))?; | ||
| ``` | ||
|
|
||
| **Prefer safe indexing** — use `.get()` + `?` or iterate where possible. | ||
| `clippy::indexing_slicing` is denied crate-wide but locally allowed in some | ||
| performance-critical modules (e.g., `value.rs`). | ||
|
|
||
| **No unchecked arithmetic** — use `checked_add()`, `saturating_add()`, etc. | ||
|
|
||
| **no_std discipline** (applies to `src/` core crate) — `use core::` and `alloc::` | ||
| by default. Only `std::` behind `#[cfg(feature = "std")]`. | ||
|
|
||
| **Unsafe forbidden** — `#![forbid(unsafe_code)]` in the core crate. Only FFI | ||
| binding crates may use unsafe. | ||
|
|
||
| **Error handling** — new modules: `thiserror` enums (see `src/rvm/vm/errors.rs`). | ||
| Existing modules: `anyhow` is acceptable for consistency within the module. | ||
|
|
||
| **Feature gating** — gate modules, registrations, and public API. Add `docsrs` | ||
| annotation. Verify non-default combinations compile. | ||
|
|
||
| ## Build & Test | ||
|
|
||
| ```bash | ||
| cargo xtask ci-debug # Full debug CI suite | ||
| cargo xtask ci-release # Full release CI suite (superset) | ||
| cargo xtask test-all-bindings # All 9 language binding smoke tests | ||
| cargo xtask test-no-std # Verify no_std builds (thumbv7m-none-eabi) | ||
| cargo xtask fmt # Format workspace + bindings | ||
| cargo xtask clippy # Lint workspace + bindings | ||
| cargo test --test opa --features opa-testutil # OPA conformance | ||
| ``` | ||
|
|
||
| Git hooks auto-installed by `build.rs`: pre-commit (build+format+clippy), | ||
| pre-push (+ doc tests + no_std + OPA conformance). | ||
|
|
||
| ## Repository Layout | ||
|
|
||
| ``` | ||
| src/ Core library (no_std, forbid(unsafe_code)) | ||
| rvm/ Rego Virtual Machine ← strategic focus | ||
| languages/ Policy language extensions | ||
| builtins/ Builtin functions (~23 modules) | ||
| value.rs Value type (Null, Bool, Number, String, Array, Set, Object, Undefined) | ||
| interpreter.rs Tree-walking interpreter | ||
| engine.rs Engine API (public surface also includes lib.rs re-exports) | ||
| bindings/ 9 language bindings + ffi layer (c/, c-nostd/, cpp/, csharp/, go/, java/, python/, ruby/, wasm/) | ||
| tests/ Integration, conformance, domain-specific tests | ||
| docs/ Grammar, builtins, RVM docs | ||
| xtask/ Development automation CLI | ||
| benches/ Criterion benchmarks | ||
| ``` | ||
|
|
||
| ## Supply Chain Security | ||
|
|
||
| - `dependency-audit.yml` — cargo-audit + cargo-deny across all Cargo.lock files | ||
| - Dependabot — weekly updates for Cargo, Actions, Maven, NuGet, pip, bundler, Go | ||
| - New GitHub Actions references use pinned commit SHAs where possible | ||
| - `cargo fetch --locked` in CI for reproducible builds | ||
|
|
||
| ## When Making Changes | ||
|
|
||
| 1. **Consider all 9 binding targets** — API changes affect every language | ||
| 2. **Both execution paths** — features must work in interpreter AND RVM | ||
| 3. **Test Undefined propagation** — `Undefined ≠ false`, test both paths | ||
| 4. **Run `cargo xtask ci-debug`** before submitting | ||
| 5. **Update docs** — `docs/builtins.md`, `docs/rvm/` as needed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. | ||
| # | ||
| # Environment setup for the Copilot coding agent. | ||
| # This workflow prepares the VM so that Copilot can run skills and tools. | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| with: | ||
| fetch-depth: 0 # full history needed for git diff against main | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,198 @@ | ||
| --- | ||
| name: code-review | ||
| description: >- | ||
| Fast multi-perspective code review for regorus. Use for everyday code reviews. | ||
| Reviews from 3 perspectives with calibrated severity and noise filtering. | ||
| allowed-tools: shell | ||
|
anakrish marked this conversation as resolved.
anakrish marked this conversation as resolved.
|
||
| --- | ||
|
|
||
| # Code Review Skill | ||
|
|
||
| ## What You're Protecting | ||
|
|
||
| A bug in regorus can mean `allow` when the answer should be `deny`. | ||
| Review this diff to find bugs that matter at that severity level. | ||
|
|
||
| Key constraints (details in copilot-instructions.md): | ||
| - **Undefined ≠ false** — silent wrong policy results | ||
| - **Panics across FFI** → permanent engine poisoning (process-wide) | ||
| - **9 binding targets** → any API change has 9x blast radius | ||
| - **Dual execution paths** — interpreter and RVM must agree | ||
| - **`enforce_limit()`** required in accumulation loops | ||
|
|
||
| **Do not** run cargo, clippy, tests, or build commands. Diff-review only. | ||
|
|
||
| ## Step 1: Get the Diff | ||
|
|
||
| ```bash | ||
| BASE=$(git merge-base upstream/main HEAD 2>/dev/null \ | ||
| || git merge-base origin/main HEAD 2>/dev/null) | ||
| if [ -z "$BASE" ]; then | ||
| echo "ERROR: Cannot find upstream/main or origin/main. Cannot determine review scope." | ||
| exit 1 | ||
| fi | ||
| echo "Reviewing changes since: $BASE" | ||
| git diff "$BASE"..HEAD --stat | ||
| git diff "$BASE"..HEAD -- '*.rs' '*.toml' 'examples/' | ||
| ``` | ||
|
|
||
| If the diff is empty, stop and report: "No changes found to review." | ||
|
|
||
| ## Step 2: Triage and Inventory | ||
|
|
||
| Classify the diff before reviewing: | ||
| - **Trivial/mechanical**: renames, formatting, comments, dep version bumps, generated code | ||
| → Report "No material issues found" unless something catches your eye. Skip Step 3. | ||
| - **Targeted change**: ≤300 changed lines in a focused area → Review with relevant perspectives. | ||
| - **Large/cross-cutting**: >300 lines or multiple subsystems → Review all perspectives. | ||
|
|
||
| **Quick inventory:** List every changed function/struct/pub item (one line each). | ||
| At the end of Step 3, confirm you examined each one. | ||
|
|
||
| ## Step 3: Review — Three Passes | ||
|
|
||
| **Your goal is breadth.** Cover the entire diff, don't fixate on one area. | ||
| Report anything suspicious even if you're only 60% sure — better to include a | ||
| Low finding than miss a Medium. | ||
|
|
||
| ### Pass 1: Line-by-line correctness | ||
|
|
||
| Walk through every changed line. For each, ask: | ||
| - What was the author's intent? Does the code achieve it for ALL inputs? | ||
| - What happens with: empty, null, zero, max-size, wrong-type, nested, Undefined? | ||
| - What happens on Windows? With non-ASCII? With empty string vs absent? | ||
| - If output must follow a standard (SARIF, URI, JSON Schema): are all MUST | ||
| requirements met? Reserved chars escaped? Required fields present? | ||
| - What does the most common real-world input to this function look like? | ||
| Does the code handle that correctly? What about the second and third most | ||
| common patterns? | ||
|
|
||
| For suspicious code paths, trace a concrete value through them: | ||
| ``` | ||
| input = <concrete example> | ||
| → after line N: variable = <concrete value> | ||
| → after line M: result = <concrete value> | ||
| → expected: <what it should be> | ||
| ``` | ||
| Concrete traces strengthen Critical/High findings but are NOT required to | ||
| report a finding. If something looks wrong, report it — even at Medium/Low | ||
| confidence. | ||
|
anakrish marked this conversation as resolved.
|
||
|
|
||
| Use `view` to read surrounding context for anything suspicious. | ||
|
|
||
| ### Pass 2: System-level consequences | ||
|
|
||
| Step back from individual lines: | ||
| - Does this new API freeze anything via semver? (pub fields, pub types, pub mods | ||
| without feature gates) | ||
| - Could a caller misuse this API in a way the author didn't anticipate? | ||
| - Resource consumption: is anything proportional to untrusted input without bounds? | ||
| - Error handling: are errors propagated or silently swallowed? Appropriate types? | ||
| - Does this interact badly with existing features? (feature flags, no_std, `arc`, | ||
| dual interpreter/RVM paths) | ||
| - If touching `src/engine.rs`, `src/lib.rs`, or `bindings/`: do all 9 targets handle it? | ||
| - If touching `Cargo.toml` or `#[cfg(feature)]`: feature gate correctness, no_std? | ||
|
|
||
| ### Pass 3: What's missing | ||
|
|
||
| Scan the diff stat one final time: | ||
| - Are there files or functions you haven't examined closely? Look now. | ||
| - For each new public function: what happens with every `Value` variant? | ||
| (Null, Bool, Number, String, Array, Set, Object, Undefined) | ||
| - What test cases would you write? Are the obvious ones present? | ||
| - What does the code assume about inputs that isn't validated? | ||
| - If control flow uses `break` in nested loops — does it exit the right level? | ||
|
|
||
| ### Edge-Case Exploration | ||
|
|
||
| For each significant new function or data transformation: | ||
|
|
||
| 1. **Boundary inputs**: empty collections, zero/max integers, single vs many, | ||
| deeply nested | ||
| 2. **Type mismatches**: expected object with fields → gets string/array/Undefined? | ||
| Silent default? Error? Wrong output passed downstream? | ||
| 3. **Platform variance**: Unix assumptions? (path separators, encoding, locale). | ||
| Wrong output on Windows? | ||
| 4. **Composition**: How does this interact with other modules? Could a valid | ||
| combination produce unexpected behavior? | ||
| 5. **Specification conformance**: If output follows a standard, are all MUST/SHOULD | ||
| met? Reserved chars escaped? Required fields always present? | ||
|
|
||
| Only report edge cases with concrete example input → wrong output. | ||
|
anakrish marked this conversation as resolved.
|
||
|
|
||
| ## Step 4: Design Considerations | ||
|
|
||
| Skip if the diff is trivial/mechanical or <50 changed lines. | ||
|
|
||
| Otherwise, briefly assess (2-3 sentences each, only if relevant): | ||
| - Is there a fundamentally simpler way to achieve the same goal? | ||
| - Does this duplicate existing infrastructure that could be reused? | ||
| - Are there tradeoffs the author may not have considered? | ||
|
|
||
| Only suggest alternatives you can concretely describe with clear benefit. | ||
|
|
||
| ## Step 5: Report | ||
|
|
||
| ### Findings (sorted by severity) | ||
|
|
||
| For each finding: | ||
| - **Severity**: Critical / High / Medium / Low | ||
| - **Confidence**: High / Medium / Low | ||
| - **Perspective**: which perspective found it | ||
| - **Location**: file:line | ||
| - **Issue**: one-sentence summary | ||
| - **Trace**: concrete input → concrete intermediate values → concrete wrong output | ||
| (strengthens Critical/High but not required for Medium/Low) | ||
| - **Evidence**: the specific code (max 5 lines) and why it's wrong | ||
| - **Suggestion**: concrete fix (include code snippet when possible) | ||
|
|
||
| **Confidence guide:** | ||
| - **High**: you have a concrete trace showing wrong output | ||
| - **Medium**: pattern match + plausible scenario but no full trace | ||
| - **Low**: suspicious but cannot fully demonstrate the issue | ||
|
|
||
| **Severity calibration — lean toward reporting, not filtering.** | ||
| A separate review step can always downgrade. If you're unsure between two | ||
| severity levels, pick the higher one. | ||
|
|
||
| - **Critical**: Wrong policy result (allow/deny), panic reachable from FFI, security bypass. | ||
| Every Critical MUST include: who triggers it, what specific input, why guards fail. | ||
| If you can't construct a trigger path, downgrade to High. | ||
| - **High**: Panic in non-FFI path, unbounded resource usage, API break, data loss/corruption | ||
| - **Medium**: Logic error with limited blast radius, silent wrong output for edge-case inputs, | ||
| missing bound on trusted path, design issue with concrete consequence | ||
| - **Low**: Minor inefficiency with measurable impact, missing validation, documentation gap | ||
|
|
||
| **Do NOT report:** | ||
| - Style preferences (naming, formatting) with no functional impact | ||
| - Anything the compiler or ~53 deny lints would catch | ||
| - "Consider using X" without explaining what goes wrong if you don't | ||
|
|
||
| **0 findings is valid** — do not manufacture findings without evidence. | ||
|
|
||
| **Calibration examples:** | ||
|
|
||
| Good finding: | ||
| > HIGH | src/eval.rs:42 | `items[idx]` where `idx` comes from untrusted input | ||
| > via `parse_array()` at line 38. No bounds check between parse and use. | ||
| > **Fix:** `items.get(idx).ok_or_else(|| anyhow!("index out of bounds"))?` | ||
|
|
||
| Bad finding (reject): | ||
| > "This unwrap could panic" — without verifying the value isn't guaranteed | ||
| > `Some` by construction. Check first. | ||
|
|
||
| Bad finding (reject): | ||
| > "Consider using a more descriptive variable name." | ||
|
|
||
| ### Design Notes | ||
|
|
||
| Observations from Step 4 (if applicable). | ||
|
|
||
| ### Coverage Check | ||
|
|
||
| Confirm: every function/struct from your inventory was examined in at least | ||
| one pass. If any were skipped, note them and briefly assess. | ||
|
|
||
| ### Summary | ||
|
|
||
| X findings (N critical, N high, N medium, N low). One sentence overall assessment. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.