This document defines how agents should interpret Chisel responses: security posture, status codes, trust ordering, and heuristic limits. It complements AGENT_PLAYBOOK.md (workflows) and ZERO_DEPS.md (dependency policy).
- Default Chisel is stdlib-only for core analysis—no bundled third-party Python dependencies in the core install. That reduces supply-chain risk for the analysis runtime.
- Git is used via
subprocess(thegitbinary), not a vendored library. - Parsing uses
ast(Python) and regex extractors (other languages), not tree-sitter or LSP in the default package. Accuracy trades off against install weight and audit surface. - Optional tree-sitter/LSP-style extractors can be registered in the user's environment (
register_extractor,CHISEL_BOOTSTRAP) without changing the core wheel—seeCUSTOM_EXTRACTORS.md.
Agents should not assume semantic parity with IDE-grade analysis; treat results as strong heuristics plus git facts where applicable.
When a tool returns a dict, check keys in this order before trusting long lists or scores:
status—no_data|no_changes|no_edges|git_error|stale_db(and tool-specific)error— e.g.not_a_git_repoondiff_impact/ git failuresmessage,hint_meta— onrisk_map/ triage (uniform components, thresholds)- List payloads (
files,test_gaps, etc.)
Empty [] without a status usually means “nothing matched,” not “error.” git_error is never an empty list.
| Value | Meaning |
|---|---|
direct |
Test–code edge in DB |
co_change |
Via git co-change partner |
import_graph |
Via static import graph reachability |
static_require |
Test file import/require resolved to this source path (no DB edge) |
hybrid |
Both DB impact and static scan agree on the same test |
fallback |
Stem/name similarity only |
working_tree |
Untracked file stem match |
Trust: direct > hybrid > import_graph / co_change > static_require > fallback / working_tree. Always run tests in the real environment before release.
Chisel's JS/TS extractor detects require() patterns that cannot be statically resolved:
dep_type |
require_type |
Confidence | Example |
|---|---|---|---|
import |
(static) | 1.0 | require('./foo') |
tainted_import |
variable |
1.0 | const M = './foo'; require(M) — path was statically traced |
dynamic_import |
variable |
0.3 | require(variable) — unknown variable |
dynamic_import |
template |
0.4 | require(`./${x}`) |
dynamic_import |
concat |
0.2 | require('./' + name) |
dynamic_import |
conditional |
0.3 | require(cond ? './a' : './b') |
eval_import |
eval |
0.0 | new Function('require', code) |
confidence indicates how reliably the actual module path is known. Edge weights blend proximity * sqrt(confidence) so low-confidence requires contribute less to impact scores. tainted_import is a special case where variable assignment was tracked (const MODULE = './plugins/auth') so the path is known — it gets full confidence.
Each per-file entry in risk_map output includes:
| Field | Type | Description |
|---|---|---|
hidden_risk_factor |
float | Additive risk component (0–0.15) from dynamic_import/eval_import edge density: min(dynamic_edge_count/20, 1.0) * 0.15 |
new_file_boost |
float | Additive 0.5 for files with zero churn and zero test coverage, so new/untracked files surface in rankings |
shadow_edge_count |
int | Edges that are not call-type (import + dynamic types) |
dynamic_edge_count |
int | Edges with edge_type of dynamic_import or eval_import |
breakdown.hidden_risk |
float | Same as hidden_risk_factor (for breakdown consistency) |
breakdown.new_file_boost |
float | Same as new_file_boost (for breakdown consistency) |
hidden_risk_factor and new_file_boost are added to the risk score after the 6-component formula. Files with 20+ dynamic edges reach the maximum 0.15 hidden-risk uplift. Use dynamic_edge_count to assess how many deps are actually unknown. Use working_tree=true to include untracked files in risk_map scoring.
test_instability in risk_map and failure-rate boosting in suggest_tests are only populated when agents call record_result after running tests. If you never record results:
test_instabilitywill be uniformly0.0for all files.suggest_testscannot boost historically flaky tests.
Best practice: After every test run (local or CI), call record_result(test_id, passed, duration_ms?) for each test. This makes Chisel's prioritization more accurate over time.
When CHISEL_SHARDS is configured:
- Query tools (
risk_map,triage,diff_impact,impact,test_gaps,stale_tests,suggest_tests,stats) transparently aggregate results across all shard databases. - Write tools (
analyze,update,record_result) automatically route to the shard that owns the file path. tool_analyzeandtool_updateaccept an optionalshardparameter to target a specific shard.tool_start_jobalso acceptsshardto run background analyze/update on a single shard.
Agents do not need to change call patterns — sharding is transparent to read operations.
Several read-only tools now support auto_update=True:
diff_impactsuggest_testsrisk_maptest_gapstriage
When enabled, Chisel checks if the DB is stale (changed files missing). If ≤50 files changed and no background job is running, it performs a lightweight update() inline before returning results. For risk_map and triage, _meta.auto_update_performed indicates whether the refresh happened. If the cap is exceeded, the tool falls back to its normal stale-DB behavior.
- MCP wraps results with
next_stepssuggestions when using HTTP/stdio servers—use them for follow-up tool calls. - Long
analyzeruns should usestart_job+job_statusor a terminal to avoid client timeouts. - For repos with >300 code files,
analyzewithforce=Truenow auto-queues a background job and returnsstatus: "auto_queued". - CLI
chisel run -- <test-command>runs tests and auto-records results. Currently supports pytest and Jest.
Tighter integration (custom transports, UI, or bundled system tools) is a host concern. This repo stays MCP + CLI-friendly and stdlib-first as above.