Skip to content

Releases: IronAdamant/Chisel

v0.14.0

11 Jun 06:17

Choose a tag to compare

Added

  • Claude Code skill (skills/SKILL.md): an installable agent skill distilling the day-to-day Chisel protocol (analyze → diff_impact → chisel run → triage), v0.13 behavior notes, source trust ordering, and empty-result remedies. Copy to ~/.claude/skills/chisel/SKILL.md; mirrors the Wikifier skill pattern.

Changed

  • Agent-facing docs refreshed for v0.13: docs/AGENT_PLAYBOOK.md gains v0.13 behavior notes (gitignore-aware scanning + CHISEL_INCLUDE_IGNORED, real CLI exit codes, edge_rebuild_skipped) and links the skill; README features list now documents gitignore-aware scanning.

v0.13.0

11 Jun 04:33

Choose a tag to compare

Fixed

  • gitignore-aware scanning (resolves Logged_issues/2026-06-11_directory-scoped-analyze-walks-all-tests.md): the engine code scan and test discovery walked everything under the project root, including gitignored trees (vendored deps, build output, bulk fixture dirs) — on a repo with a 10k-file ignored fixture tree, test discovery found 3,895 test files and updates ran 18+ minutes. Both walks now filter through git ls-files --cached --others --exclude-standard (tracked + untracked-but-not-ignored, so working-tree analysis still sees new files) and never traverse ignored trees. Set CHISEL_INCLUDE_IGNORED=1 to disable; non-git projects are unfiltered as before.
  • CLI exit codes: the console script ran sys.exit(main()), and main() returns the result dict — so every successful chisel command exited 1 and dumped the raw dict to stderr, breaking chisel analyze && ... scripting. The new cli_entry entry point converts results to real exit codes (status: error|git_error → 1, everything else → 0; chisel run passes through the test runner's code).

Changed

  • build_test_edges is ~76× faster with byte-identical output (A/B-verified against the previous algorithm on the same input: 14,667 edges, 17.6s → 0.23s). Match results are memoized per module path and dependency extraction per test file; previously every dep of every test unit scanned every code unit (33M _matches_import_path calls on this repo alone — the dominant cost of analyze/update, and the real culprit behind multi-minute updates on monorepos).
  • update() skips edge rebuilding when nothing changed: with no changed files and no new commits, the discover/rebuild phases are skipped and the result includes edge_rebuild_skipped: true. Per-function git log -L churn is now restricted to changed files (plus files touched by new commits) instead of every function in the project. Net on this repo: no-op update ~18s → 0.14s; single-file update ~19s → 1.4s; full analyze 19.4s → 2.3s.

v0.12.0

11 Jun 04:01

Choose a tag to compare

Added

  • cancel_job MCP schema: the tool existed in the dispatch table but had no _TOOL_SCHEMAS entry, so it was invisible in GET /tools and stdio tool listings (callable only if an agent already knew it existed). All 26 tools are now advertised; a regression test enforces schema/dispatch parity.
  • shard parameter exposed end-to-end: analyze, update, and start_job now accept shard through the MCP schemas + dispatch table, and the CLI gains --shard on analyze, update, and start-job. Previously the engine supported it but neither interface forwarded it, despite docs/AGENT_PLAYBOOK.md documenting the usage. start_job also validates unknown shard keys up front and returns a clean error.

Fixed

  • chisel run was completely broken (two independent bugs, found by dogfooding): (1) its positional argument was named command, which collided with the subparser dest — main() then crashed with TypeError: unhashable type: 'list' on every invocation; (2) _PYTEST_RESULT_RE anchored the status at end-of-line, but real pytest -v output appends a progress suffix (PASSED [ 16%]), so even a direct call never parsed real output. Both fixed; chisel run -- pytest tests/ now records results end-to-end, with regression tests through main() and against real-format output.
  • Cross-thread shard leak: _with_shard() mutated shared engine attributes (storage/lock/impact/process lock), so a background job analyzing one shard could silently redirect concurrent tool calls on other threads (HTTP MCP server, parallel agents) to the wrong shard DB. The override is now thread-local via properties; a regression test exercises two threads concurrently.
  • Unknown-shard errors crashed: the error path used sorted(self._shard_engines) over keys containing None, raising TypeError instead of returning the error dict.
  • Swift Testing @Test detection was dead code: a framework-name mismatch (swift_test vs the actual xctest) made the Swift regex unreachable — Swift @Test functions were only matched by accident via the Java regex. Now routed correctly, with support for @Test("description", ...) arguments, stacked attributes (@MainActor, @available(...)), and same-line @Test func f() declarations.
  • Java/Kotlin parameterized annotations hid test methods: @ValueSource(ints = {1, 2, 3}), @CsvSource(...), and @Test(expected = ...) broke the annotation regex, so common JUnit 5 parameterized tests were not detected.
  • C# stacked attributes hid test methods: [Fact] followed by [Trait("a", "b")] (standard xUnit categorization) was not matched.
  • test_gaps could erase all gaps in barrel/re-export-heavy projects: when static-import resolution claimed every gap file was covered, the filter emptied the list and reported a false "all code units have test coverage". If filtering would drop everything, the original gaps are now preserved.
  • suggest_tests working-tree source labeling: a constant ternary labeled every stem-matched suggestion working_tree; DB-known tests matched by stem now correctly report source: "fallback".
  • Storage dir validation: passing :memory: (or a file: URI) as CHISEL_STORAGE_DIR / storage_dir silently created a literal :memory: directory on disk. These are now rejected with a clear ValueError (multi-process coordination requires an on-disk lock file).
  • _quantize_gap clamps to 1.0 to guard against float rounding pushing coverage gap above the documented bound.
  • README: replaced 23 typographic smart quotes — the MCP config JSON example was not copy-paste valid.

Changed

  • next_steps: the diff_impact hint for record_result now includes concrete test_id/passed arguments instead of an empty args dict (record_result has required fields).

Removed

  • Dead code: Storage._ensure_main_conn(), ImpactAnalyzer._import_graph_undirected_neighbors(), the unreachable swift_test branches, and a redundant duplicated conditional in _suggest_tests_impl.

v0.11.0 — Phase 15/16 MCP gap closures (file locks, record_result visibility, uniform risk warnings, working_tree coupling)

15 May 04:07

Choose a tag to compare

v0.11.0 — Phase 15/16 MCP gap closures for LLM agents

Added

  • File lock integration: Advisory locks (acquire_file_lock etc.) are now visible in analyze/update (active_file_locks, locked_files_sample) and risk_map/triage (locked_by on files + active_locks/lock_claims in _meta). Fully wired to the main DB and working-tree flows.
  • Observable record_result: suggest_tests now returns failure_rate and failure_boost on every item so the effect of recording flaky tests is directly visible.
  • Uniform risk detection: _meta in risk_map and triage now includes uniform_risk_groups, max_identical_risk_files, and warnings when multiple files share identical risk scores (the exact gap reported by the Phase 15 CrossMCP probes).

Changed

  • coupling supports working_tree=true: On-disk static import extraction for untracked files → immediate non-zero import_coupling / import_partners (huge win during active refactoring).
  • stale_tests now accepts working_tree (MCP parity).
  • Job system robustness: Guard against thread launch failures after the in-progress flag is set (reduces "background thread not returning" errors).

Fixed

  • Multiple surfaces for parallel-agent + heavy working-tree scenarios, as stress-tested by the full 25-tool Phase 15 validation matrix in RecipeLab_alt (ChiselJobLockFlakinessOrchestrator, 200+ file moves, BaseImporter, RouteLoader split, etc.).

Grok Build edited this iteration while closing the chisel_open.md findings (renamed to chisel_closed.md).

Full details: CHANGELOG.md

v0.10.0 — Dynamic-require edges, symbol-collision fix, cycle SCC

29 Apr 10:30

Choose a tag to compare

Fixes 5 issues surfaced by external MCP testing in chisel_open.md, plus a substantial new feature for dynamic plugin loaders.

Bug fixes

  • suggest_tests symbol-collision — the name-only edge fallback (tier 4) leaked across files, so a bare dispatch() call edged to every file defining a dispatch symbol. Now scoped to files the test actually imports. Eliminates loadPlugin / dispatch / listFunctions reproductions.
  • diff_impact stale_db over-trigger — tracked diff files are now filtered by indexed code extensions, so .md / .json / .db edits no longer flag stale_db.
  • Cycle false-positives_find_circular_dependencies was being fed undirected neighbors which collapsed bow-tie DAGs into giant fake cycles. New get_imported_files_batch (directed, hard edges only) used for cycle detection.

New feature: dynamic-require edge resolution

require() patterns the static parser can partially see are now first-class import edges, scaled by confidence:

  • Hard imports (import / from … import) → confidence 1.0
  • Tainted imports (const P = './x'; require(P)) → confidence 1.0 (variable taint already resolved)
  • Dynamic imports (template literal, string concat, path.join(__dirname, 'plugins', name)) → confidence 0.2–0.4

Schema migration: import_edges gains a confidence REAL NOT NULL DEFAULT 1.0 column (existing rows default to 1.0, fully backward compatible).

The closure traversal in get_impacted_tests accumulates _IMPORT_HOP_DECAY × edge_confidence per hop, so tests of dynamically-loaded plugins now surface — at proportionally lower relevance and tagged via dynamic require() in the reason text.

Safety guards:

  • Cycle detection uses min_confidence=1.0 — soft edges can't create false cycles.
  • risk_map proximity adjustment uses hard edges only — soft edges don't understate coverage_gap.
  • Fan-out cap of 50 files per dynamic-require dep — pattern still surfaces via unknown_require_count and hidden_risk_factor even when edges aren't emitted.

New regex: _JS_REQUIRE_PATH_JOIN_RE captures require(path.join(__dirname, '<dir>', var)) — previously invisible.

Agent UX

  • suggest_tests schema now documents source confidence ranking: hybrid > direct > import_graph > co_change > static_require > working_tree > fallback.
  • dispatch_tool appends a warn_eval_used next-step when suggest_tests is queried on a JS/TS file containing eval(...) or new Function(...) — agents now see when the import graph is necessarily incomplete for that file.

Tests

795 passed (was 777 before this release). +18 regression tests covering symbol collision, diff_impact extension filter, directed-SCC cycle detection, dynamic-require resolution (5 forms), fan-out cap, proximity-vs-soft-edges, and eval-warning surfacing.

v0.9.2 — CI lint fixes

17 Apr 01:58

Choose a tag to compare

Fixed

Resolved 29 ruff errors that were blocking the CI matrix. No behavior changes.

  • chisel/engine.py: Moved JobCancelledError class and logger assignment below the import block (was triggering 12× E402). Removed unused StaticImportIndex import (F401). Removed unused lang = self.mapper.detect_framework(tf) local (F841).
  • chisel/mcp_server.py: Removed unused from socketserver import ThreadingMixIn (F401).
  • examples/extractors/lsp_symbol_extractor.py: Removed unused import uuid (F401).
  • examples/extractors/swift_syntax_extractor.py: Added # noqa: F403 / F405 on the star import and the symbols it provides (intentional — optional third-party dep).
  • tests/test_language_frameworks.py: Removed unused import os (F401).

ruff check . now passes; 777 tests pass locally.

Install

```bash
pip install chisel-test-impact==0.9.2
```

v0.9.1 — silent-swallow logging, auto_update hints, doc parity

17 Apr 01:58

Choose a tag to compare

Fixed

  • Silent exception swallowing:
    • engine._load_shard_config now logs a warning when .chisel/shards.toml fails to parse instead of silently returning an empty config.
    • storage._execute / _executemany emit a warning when SQLITE_BUSY persists past the retry cap (in addition to existing debug logs).
    • static_test_imports.StaticImportIndex debug-logs the path and error when an untracked test file can't be read, instead of silently skipping.
  • _try_auto_update race window: _scan_code_files() now runs inside the exclusive lock held by _try_auto_update, so concurrent writers can't add files between the scan and the hash-based change check.
  • auto_update skip reasons surfaced to agents: When auto_update=True is skipped (bg job running or >50 files changed), the response includes an explicit auto_update_skip_reason field and a reason-specific hint. Applied to suggest_tests and diff_impact stale-DB envelopes; test_gaps logs a warning. risk_map and triage already exposed this via _meta.

Changed

  • Documentation parity sweep: Tool count (22/2426 = 20 functional + 6 file-lock), CLI subcommand count (17/1828), and SQLite table count (10/1317) corrected across CLAUDE.md, README.md, CONTRIBUTING.md, ARCHITECTURE.md, COMPLETE_PROJECT_DOCUMENTATION.md, and wiki-local/spec-project.md. ARCHITECTURE.md tool table gained optimize_storage and cancel_job and fixed the file-lock tool names (acquire_lockacquire_file_lock, etc.) to match schemas.py.

v0.9.0 — Monorepo SQLite sharding, auto-fallback, chisel run

17 Apr 01:58

Choose a tag to compare

Highlights

Added

  • Monorepo SQLite sharding: Large repos can shard analysis data across multiple SQLite databases. Set CHISEL_SHARDS=frontend,backend (or create .chisel/shards.toml) to split data by top-level directory. All query tools auto-aggregate; write tools route to the correct shard by file path.
  • analyze auto-fallback to background job: With force=True on repos with >300 code files, analyze automatically queues a background job and returns {status: \"auto_queued\", job_id: ..., kind: \"analyze\"} to avoid MCP timeouts.
  • exclude_new_file_boost parameter: risk_map and triage accept exclude_new_file_boost=True to suppress the 0.5 new-file boost for stable long-term audits.
  • auto_update parameter for read-only tools: diff_impact, suggest_tests, risk_map, test_gaps, and triage accept auto_update=True — Chisel does a lightweight inline update() if the DB is stale. Capped at 50 changed files; skipped when a bg job is running.
  • chisel run CLI subcommand: chisel run -- <test-command> runs tests and calls record_result for each detected test. Supports pytest and Jest out of the box; Go and Rust scaffolded.
  • Extractor plugin examples: examples/extractors/tree_sitter_js_extractor.py, swift_syntax_extractor.py, lsp_symbol_extractor.py, plus docs/EXTRACTOR_ECOSYSTEM.md.

Changed

  • Documentation: README, docs/LLM_CONTRACT.md, docs/AGENT_PLAYBOOK.md, docs/CUSTOM_EXTRACTORS.md, and ARCHITECTURE updated for sharding, auto-fallback, and the extractor ecosystem.

See CHANGELOG.md for the full list.

v0.8.3 — version sync fix

17 Apr 01:57

Choose a tag to compare

Fixed

  • Version sync: chisel.__version__ was out of sync with pyproject.toml in the 0.8.2 release, causing CI failures. Realigned to 0.8.3.

v0.8.2 — optimize_storage, job cancellation, framework fixtures

17 Apr 01:57

Choose a tag to compare

Highlights

Added

  • optimize_storage MCP tool: Runs PRAGMA optimize and conditional VACUUM when the WAL grows large.
  • Incremental import graph rebuilds: _rebuild_import_edges() now only rebuilds edges for changed files — O(all_files)O(changed_files). Keeps 1k+ file monorepo updates under 3 seconds.
  • Directory-scoped suggest_tests: accepts a directory parameter and aggregates suggestions for all code files under that path.
  • Background job cancellation & events: cancel_job tool, cancel_requested_at flag, JobCancelledError, and a job_events table. analyze()/update() check for cancellation at phase boundaries.
  • Framework fixture test suite (tests/test_language_frameworks.py): C#, Java, Rust, Swift, and Go module-aware import resolution.
  • risk_map working_tree parameter: includes untracked code files in risk scoring with a new_file_boost of 0.5 so new files surface.
  • test_gaps working-tree elevation: gaps from untracked files sort to the top of the list.
  • suggest_tests/diff_impact directory-aware stem matching: same-directory tests strongly preferred over fuzzy substring matches.
  • diff_impact stale-DB detection: returns {status: \"stale_db\", ...} when changed files aren't in the DB.
  • suggest_tests auto-fallback: self-healing for newly tracked/created files with no edges.
  • start_job/job_status progress tracking: jobs report progress_pct (0–100).
  • Heuristic edge backfill during analyze/update: filename-based edges auto-created for test files with no DB edges.
  • Project fingerprint: stored in meta.project_fingerprint; warns on cross-project DB reuse.
  • MCP timeout hints: tool_analyze/tool_update recommend start_job for large repos.

Changed

  • Risk formula: new_file_boost (0.0/0.5) added. Files with no history and no tests score ~0.75 instead of ~0.25.
  • Coupling formula: import-graph coupling is now first-class — max(cochange, import, 0.5*cochange + 0.5*import).
  • risk_map defaults: coverage_mode=\"line\", proximity_adjustment=True.
  • Single-author co-change threshold halving: solo-dev commit patterns now surface coupling signal.
  • Coverage gap granularity: 4 → 20 quantization steps (0.05 increments).
  • Risk reweighting threshold: triggers on 2+ uniform components or any zero-valued uniform component.

Fixed

  • SQLite concurrency stability in storage.py (restored with self._conn as conn: wrappers).
  • risk_map crash with working_tree=true (KeyError: 'heuristic').
  • suggest_tests/diff_impact timeouts under working-tree load (StaticImportIndex caching).
  • storage.py read-only transaction error on SELECT queries.

See CHANGELOG.md for the full list.