feat: add architectural boundary and threat model assessors (ADR B.1, B.2)#503
Conversation
… B.2) Implement the final two assessors from the wg-agentic-sdlc alignment ADR: - ArchitecturalBoundaryAssessor (B.1): binary pass/fail check for linter-enforced import boundaries (ESLint, depguard, gomodguard, import-linter, flake8-tidy-imports, dependency-cruiser). Repos with <20 files return not_applicable. - ThreatModelAssessor (B.2): checks for THREAT_MODEL.md with partial credit scoring based on file existence (40 pts), content substance (10 pts), recognized sections from the 8-section schema (up to 48 pts), and threat table bonus (2 pts). Falls back to SECURITY.md threat model section for 25 pts partial credit. Weight rebalancing: test_execution 12%->11%, architecture_decisions 3%->2%, structured_logging 2%->1%, openapi_specs 3%->2%, freeing 4% for the two new attributes at 2% each. Total remains 100%. Closes #463 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughTwo new Tier 3 assessors are added: ChangesNew Tier 3 Assessors: Architectural Boundaries and Threat Model
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📈 Test Coverage Report
Coverage calculated from unit tests only |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/attributes.md`:
- Line 29: The document contains inconsistent attribute counts: line 29 states
"27 attributes" but an earlier overview sentence near the top still references
"25 attributes". Locate the overview sentence in the document that mentions 25
attributes and update it to 27 to align with the count stated on line 29,
ensuring the document is self-consistent throughout.
In `@src/agentready/assessors/security.py`:
- Around line 447-459: The exception handler in the try-except block that
catches OSError and UnicodeDecodeError when reading threat_model_path is
incorrectly returning a Finding with status="pass". Change the status from
"pass" to "fail" to properly indicate that the threat model file is unreadable
or malformed, and adjust the score accordingly to reflect the failure.
Additionally, update the error_message field from None to include the actual
exception details to provide debugging information, and update the evidence list
to clearly indicate that the threat model file could not be read rather than
just passing with partial credit.
In `@src/agentready/assessors/structure.py`:
- Around line 1533-1537: The _check_python_import_linter() method currently only
checks for the [importlinter] section in setup.cfg, missing flake8-tidy-imports
and banned-api configurations. Extend the setup.cfg check to also look for the
[flake8] section in addition to [importlinter], and when [flake8] is found,
append appropriate evidence indicating flake8-tidy-imports or banned-api is
configured. This will ensure repositories using these alternative import
boundary enforcement tools are correctly recognized and scored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 190f5433-57e7-424e-9f5c-d7be4b9a8f57
📒 Files selected for processing (7)
docs/attributes.mdsrc/agentready/assessors/__init__.pysrc/agentready/assessors/security.pysrc/agentready/assessors/structure.pysrc/agentready/data/default-weights.yamltests/unit/test_assessors_security.pytests/unit/test_assessors_structure.py
|
Review from Bill Murdock with assistance from Claude Code Overall this is a solid PR. Both assessors follow established patterns well, tests are thorough (31 new, all passing), weights sum to 1.00, and the full suite (1373 tests) shows zero regressions. A few items to address before merging. Bug
Issues to fix
Cosmetic (non-blocking)
|
- Fix __init__.py T3 comment: 14% -> 13% (matches actual weights) - Fix ThreatModelAssessor: unreadable files return status="fail" (score 40 is below pass threshold of 50) - Use path.is_file() instead of path.exists() in threat model file search to avoid matching directories - Add flake8-tidy-imports/banned-api check for setup.cfg in ArchitecturalBoundaryAssessor (was only checked in pyproject.toml) - Fix docs/attributes.md overview: 25 -> 27 attributes - Align default-weights.yaml comment spacing for architectural_boundaries - Replace os.path.join with pathlib.Path in ThreatModelAssessor, remove unused os import Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Follow-up from real-world testing (Bill Murdock with assistance from Claude Code) Tested both new assessors against apache/ranger, which has an extensive PMC-reviewed THREAT_MODEL.md. Two issues surfaced. 1. ThreatModelAssessor: section matching too rigid for real-world threat modelsRanger's THREAT_MODEL.md is arguably a gold standard (334 lines, 14 detailed sections, PMC-reviewed with maintainer sign-off), but scores only 56/100 with 1/8 sections matched. Two root causes: Section prefix handling: The regex Vocabulary gap: The word-matching requires all canonical words to appear in the heading. Real threat models use different terminology for the same concepts:
Only "open questions" (§14) matched. Consider adding synonym/alias support or a more flexible matching strategy for a follow-up. 2. ArchitecturalBoundaryAssessor: should return not_applicable for unsupported languagesThe assessor currently gates applicability only on file count (>=20), not on language. It checks tools for JS/TS (ESLint), Go (golangci), and Python (import-linter/flake8-tidy-imports), plus dependency-cruiser. Ranger is a Java project, so none of those tools are relevant, but it scores fail (0%) instead of not_applicable. Java repos could use ArchUnit or Checkstyle import controls for boundary enforcement, but the assessor doesn't check for those. Until Java (and Rust, Ruby, C#, etc.) tools are supported, repos in those languages should get |
…uage gating ThreatModelAssessor: real-world testing against apache/ranger showed the section matching was too rigid, matching only 1/8 sections on a gold-standard threat model. - Regex now strips section symbol prefixes (e.g., ## §7 Adversary model) - Added synonym/alias support for all 8 canonical sections so common heading variations are recognized (e.g., "Adversary model" matches "threats", "Out of scope" matches "deprioritized", "Trust boundaries" matches "entry points") - Threat table detection also matches "adversary model" headings ArchitecturalBoundaryAssessor: Java repos scored fail(0%) despite having no supported boundary tools to check. - Added language gating: repos whose detected languages have no overlap with supported languages (Python, JS, TS, Go) now return not_applicable instead of fail - Repos with empty language detection remain applicable (graceful fallback) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/agentready/assessors/structure.py (1)
1481-1497:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftBoundary “enforcement” can be falsely detected from plain string presence.
Current checks treat any occurrence of rule/tool names as configured enforcement. This will pass on disabled rules (
"no-restricted-imports": "off"), comments, or unrelated text, producing incorrect 100% scores.This needs structured parsing/validation per config type (at least JSON/TOML where parseable) and an enabled-state check for ESLint rules.
Also applies to: 1508-1552
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/agentready/assessors/structure.py` around lines 1481 - 1497, The current implementation checks for boundary rule string presence in configuration files without validating whether those rules are actually enabled. You need to parse configuration files structurally (JSON for package.json and eslintConfig sections) and verify each matched rule has an enabled state (not "off" or 0). In the loop checking eslint_configs, after finding matched rules from boundary_rules, parse the config content to check the actual configuration value for each matched rule rather than just confirming string presence. Similarly, in the package.json eslintConfig section where you build the matched list, parse the eslintConfig JSON object and validate that each matched rule is not disabled before appending to tools_found and evidence. This ensures only actively enforced boundary rules are counted toward the assessment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/agentready/assessors/security.py`:
- Around line 566-575: The _match_canonical_section method checks canonical
sections before alias matches, causing headings like "Non-threats" to be
incorrectly classified as "threats" (substring match) instead of reaching the
intended "deprioritized" alias. Reverse the matching order by moving the loop
that iterates through self.SECTION_ALIASES and checks aliases to execute before
the loop that checks self.CANONICAL_SECTIONS with word matching. This ensures
more specific alias matches are found before generic substring matches in
canonical sections.
In `@src/agentready/assessors/structure.py`:
- Around line 1418-1423: The supported language check using
`_has_supported_language(repository)` at the beginning of the method returns
`not_applicable` before the dependency-cruiser detection logic can run, which
incorrectly blocks valid cross-language boundary tool detection. Reorder the
logic so that language-agnostic tool checks (like the dependency-cruiser check
that should occur around line 1431) are performed before the supported language
validation. This allows repositories with dependency-cruiser configuration to
pass detection regardless of their primary programming language, while still
applying language checks to language-specific boundary tools.
In `@tests/unit/test_assessors_security.py`:
- Around line 1113-1138: The test_synonym_matching method does not explicitly
verify the Non-threats to deprioritized alias mapping. Replace the current "##
Out of scope" section in the content string with "## Non-threats" to create a
regression test for this specific mapping, then add an assertion to verify that
"non-threats" appears in the matched evidence string to ensure this
collision-prone heading is properly recognized and prevents future scoring
regressions.
In `@tests/unit/test_assessors_structure.py`:
- Around line 1515-1534: Add a new test method after the existing test cases in
the test class that creates a Java-only repository (using _make_repo with
languages set to {"Java": 100} or similar) but also includes a
.dependency-cruiser.cjs configuration file in the repo directory. Instantiate
the ArchitecturalBoundaryAssessor and call assess on the repo, then assert that
the finding status is not "not_applicable" to verify that the presence of
dependency-cruiser configuration makes the repo applicable even when the primary
language is unsupported (Java). This test prevents regressions in the
cross-language boundary-tool applicability logic.
---
Outside diff comments:
In `@src/agentready/assessors/structure.py`:
- Around line 1481-1497: The current implementation checks for boundary rule
string presence in configuration files without validating whether those rules
are actually enabled. You need to parse configuration files structurally (JSON
for package.json and eslintConfig sections) and verify each matched rule has an
enabled state (not "off" or 0). In the loop checking eslint_configs, after
finding matched rules from boundary_rules, parse the config content to check the
actual configuration value for each matched rule rather than just confirming
string presence. Similarly, in the package.json eslintConfig section where you
build the matched list, parse the eslintConfig JSON object and validate that
each matched rule is not disabled before appending to tools_found and evidence.
This ensures only actively enforced boundary rules are counted toward the
assessment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: a5232be4-d4cc-47c5-b50b-0399dddba39b
📒 Files selected for processing (5)
docs/attributes.mdsrc/agentready/assessors/security.pysrc/agentready/assessors/structure.pytests/unit/test_assessors_security.pytests/unit/test_assessors_structure.py
| def _match_canonical_section(self, heading_lower: str) -> str | None: | ||
| for canonical in self.CANONICAL_SECTIONS: | ||
| canonical_words = canonical.split() | ||
| if all(word in heading_lower for word in canonical_words): | ||
| return canonical | ||
| for canonical, aliases in self.SECTION_ALIASES.items(): | ||
| for alias in aliases: | ||
| if alias in heading_lower: | ||
| return canonical | ||
| return None |
There was a problem hiding this comment.
Alias-specific headings can be misclassified due to match order.
On Line 569, canonical substring matching runs before aliases. A heading like Non-threats gets classified as threats (because "threats" is a substring), so the intended deprioritized alias on Line 403 is never reached. This skews section scoring.
Proposed fix
def _match_canonical_section(self, heading_lower: str) -> str | None:
- for canonical in self.CANONICAL_SECTIONS:
- canonical_words = canonical.split()
- if all(word in heading_lower for word in canonical_words):
- return canonical
for canonical, aliases in self.SECTION_ALIASES.items():
for alias in aliases:
if alias in heading_lower:
return canonical
+ for canonical in self.CANONICAL_SECTIONS:
+ canonical_words = canonical.split()
+ if all(word in heading_lower for word in canonical_words):
+ return canonical
return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/agentready/assessors/security.py` around lines 566 - 575, The
_match_canonical_section method checks canonical sections before alias matches,
causing headings like "Non-threats" to be incorrectly classified as "threats"
(substring match) instead of reaching the intended "deprioritized" alias.
Reverse the matching order by moving the loop that iterates through
self.SECTION_ALIASES and checks aliases to execute before the loop that checks
self.CANONICAL_SECTIONS with word matching. This ensures more specific alias
matches are found before generic substring matches in canonical sections.
| if not self._has_supported_language(repository): | ||
| langs = ", ".join(sorted(repository.languages.keys())) | ||
| return Finding.not_applicable( | ||
| self.attribute, | ||
| reason=f"No supported languages detected ({langs}); boundary checks cover Python, JavaScript, TypeScript, Go", | ||
| ) |
There was a problem hiding this comment.
Unsupported-language short-circuit blocks valid dependency-cruiser detection.
On Line 1418, not_applicable is returned before Line 1431 runs. That means repos in unsupported languages can never pass even if .dependency-cruiser.* is present, which conflicts with the assessor’s own “any language / cross-language” boundary-tool path.
Proposed fix
def assess(self, repository: Repository) -> Finding:
if repository.total_files < self.FILE_THRESHOLD:
return Finding.not_applicable(
self.attribute,
reason=f"Repository has {repository.total_files} files (boundary rules relevant for >={self.FILE_THRESHOLD})",
)
+ # Cross-language boundary tool: allow pass even when language set is unsupported
+ tools_found = []
+ evidence = []
+ self._check_dependency_cruiser(repository, tools_found, evidence)
+ if tools_found:
+ return Finding(
+ attribute=self.attribute,
+ status="pass",
+ score=100.0,
+ measured_value=f"boundary tools: {', '.join(tools_found)}",
+ threshold="at least one import boundary tool configured",
+ evidence=evidence,
+ remediation=None,
+ error_message=None,
+ )
+
if not self._has_supported_language(repository):
langs = ", ".join(sorted(repository.languages.keys()))
return Finding.not_applicable(
self.attribute,
reason=f"No supported languages detected ({langs}); boundary checks cover Python, JavaScript, TypeScript, Go",
)
- tools_found = []
- evidence = []
-
self._check_eslint(repository, tools_found, evidence)
self._check_go_boundary_tools(repository, tools_found, evidence)
self._check_python_import_linter(repository, tools_found, evidence)
- self._check_dependency_cruiser(repository, tools_found, evidence)Also applies to: 1431-1431, 1554-1569
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/agentready/assessors/structure.py` around lines 1418 - 1423, The
supported language check using `_has_supported_language(repository)` at the
beginning of the method returns `not_applicable` before the dependency-cruiser
detection logic can run, which incorrectly blocks valid cross-language boundary
tool detection. Reorder the logic so that language-agnostic tool checks (like
the dependency-cruiser check that should occur around line 1431) are performed
before the supported language validation. This allows repositories with
dependency-cruiser configuration to pass detection regardless of their primary
programming language, while still applying language checks to language-specific
boundary tools.
| def test_synonym_matching(self, tmp_path): | ||
| """Alias headings match their canonical equivalents.""" | ||
| content = ( | ||
| "# Threat Model\n\n" | ||
| "## Scope and intended use\n" | ||
| "A REST API.\n\n" | ||
| "## Trust boundaries and data flow\n" | ||
| "External users access via /api.\n\n" | ||
| "## Out of scope\n" | ||
| "Local attacks.\n\n" | ||
| "## Adversary model\n" | ||
| "Remote unauthenticated attackers.\n" | ||
| ) | ||
| (tmp_path / "THREAT_MODEL.md").write_text(content) | ||
| repo = self._make_repo(tmp_path) | ||
| assessor = ThreatModelAssessor() | ||
| finding = assessor.assess(repo) | ||
| sections_evidence = [ | ||
| e for e in finding.evidence if "recognized sections" in e.lower() | ||
| ] | ||
| assert len(sections_evidence) == 1 | ||
| matched = sections_evidence[0] | ||
| assert "system context" in matched | ||
| assert "entry points" in matched | ||
| assert "deprioritized" in matched | ||
| assert "threats" in matched |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Add a regression test for Non-threats → deprioritized mapping.
The synonym tests are good, but they don’t cover the non-threats alias explicitly. Adding that case would lock behavior for a known collision-prone heading and prevent scoring regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/test_assessors_security.py` around lines 1113 - 1138, The
test_synonym_matching method does not explicitly verify the Non-threats to
deprioritized alias mapping. Replace the current "## Out of scope" section in
the content string with "## Non-threats" to create a regression test for this
specific mapping, then add an assertion to verify that "non-threats" appears in
the matched evidence string to ensure this collision-prone heading is properly
recognized and prevents future scoring regressions.
| def test_java_repo_not_applicable(self, tmp_path): | ||
| """Java-only repo gets not_applicable (unsupported language).""" | ||
| repo = self._make_repo(tmp_path, languages={"Java": 90, "XML": 10}) | ||
| assessor = ArchitecturalBoundaryAssessor() | ||
| finding = assessor.assess(repo) | ||
| assert finding.status == "not_applicable" | ||
|
|
||
| def test_mixed_language_with_supported(self, tmp_path): | ||
| """Repo with Java and Python is still applicable.""" | ||
| repo = self._make_repo(tmp_path, languages={"Java": 60, "Python": 40}) | ||
| assessor = ArchitecturalBoundaryAssessor() | ||
| finding = assessor.assess(repo) | ||
| assert finding.status != "not_applicable" | ||
|
|
||
| def test_no_languages_defaults_applicable(self, tmp_path): | ||
| """Repo with empty languages dict remains applicable.""" | ||
| repo = self._make_repo(tmp_path, languages={}) | ||
| assessor = ArchitecturalBoundaryAssessor() | ||
| finding = assessor.assess(repo) | ||
| assert finding.status != "not_applicable" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Add an edge-case test: Java-only repo with .dependency-cruiser.cjs should not be not_applicable.
The new gating tests cover unsupported languages, but they miss the cross-language boundary-tool path. A targeted case here would prevent regressions in applicability logic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/test_assessors_structure.py` around lines 1515 - 1534, Add a new
test method after the existing test cases in the test class that creates a
Java-only repository (using _make_repo with languages set to {"Java": 100} or
similar) but also includes a .dependency-cruiser.cjs configuration file in the
repo directory. Instantiate the ArchitecturalBoundaryAssessor and call assess on
the repo, then assert that the finding status is not "not_applicable" to verify
that the presence of dependency-cruiser configuration makes the repo applicable
even when the primary language is unsupported (Java). This test prevents
regressions in the cross-language boundary-tool applicability logic.
|
Status update from Bill Murdock with assistance from Claude Code All review feedback has been addressed across two fix commits ( Issues resolvedRound 1 (
Round 2 (
Testing performedUnit tests: Full suite passes, 1380 passed, 21 skipped, 0 failures. The three commits added a total of 38 new tests covering section symbol prefixes, synonym matching, Ranger-style threat models, adversary model table detection, Java-only repo language gating, mixed-language applicability, and empty-language fallback. Weight verification: Confirmed programmatically that all weights sum to exactly 1.00 (T1: 58%, T2: 27%, T3: 13%, T4: 2%, 27 attributes total). Real-world testing against apache/ranger:
Real-world testing against yjs/yjs:
Follow-upFiled #505 to audit remediation instructions across all assessors for alignment with scoring logic (not blocking this PR). |
|
🎉 This PR is included in version 2.48.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Summary
ArchitecturalBoundaryAssessor(B.1): binary pass/fail check for linter-enforced import boundaries across JS/TS (ESLint), Go (depguard/gomodguard), Python (import-linter/flake8-tidy-imports), and general (dependency-cruiser). Repos with <20 files return not_applicable.ThreatModelAssessor(B.2): partial-credit scoring for THREAT_MODEL.md based on existence, content substance, 8-section schema coverage, and threat table structure. Falls back to SECURITY.md for partial credit.Closes #463
Test plan
agentready assess .runs end-to-end showing both new attributes🤖 Generated with Claude Code
Summary by CodeRabbit
THREAT_MODEL.md) with section and threats-table checks.