Skip to content

feat: add --fresh flag to parse for forced reparse#38

Draft
joshbouncesecurity wants to merge 3 commits intoknostic:masterfrom
joshbouncesecurity:feat/issue16-19-parse-fresh
Draft

feat: add --fresh flag to parse for forced reparse#38
joshbouncesecurity wants to merge 3 commits intoknostic:masterfrom
joshbouncesecurity:feat/issue16-19-parse-fresh

Conversation

@joshbouncesecurity
Copy link
Copy Markdown
Contributor

Summary

Adds --fresh to openant parse to force a full reparse from scratch without manually deleting dataset.json. Useful when parser improvements are deployed and the existing dataset needs to be regenerated.

The JS parser also now logs a hint pointing at --fresh when existing units are reused, so users discover the flag when they need it.

Addresses item 19 from #16 (does not close the issue).

Test plan

  • openant parse <repo> reuses existing units when dataset.json already exists (default).
  • openant parse <repo> --fresh deletes the cached dataset and reparses from scratch.
  • After running with --fresh, the JS parser hint about reused units no longer fires for that run.

The parse step's unit generator merges new units into an existing
dataset.json, preserving old units as-is. This means changes to the
parser (e.g., improved call graph resolution) don't take effect for
previously-parsed units unless the dataset is deleted manually.

Add --fresh flag to parse (and ensure scan --fresh also clears the
dataset) so users can force a full reparse when needed.

- Go CLI: add --fresh flag to parse command, pass through to Python
- Python CLI: add --fresh arg to parse subparser
- parser_adapter: delete existing dataset.json when fresh=True
- scanner: include dataset.json in fresh cleanup alongside checkpoints
- unit_generator: add stderr note when existing units are reused

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract buildParsePyArgs from runParse so the helper is the source of
  truth (tests no longer keep a parallel copy with 'keep in sync')
- Replace exists()+remove() with try/except FileNotFoundError to avoid
  TOCTOU race when two --fresh parses run concurrently
- Clarify --fresh help text and docstring: only dataset.json is deleted;
  other artifacts in the output dir are preserved
@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Manual verification

  • openant parse <repo> twice in a row: second run is fast (cached dataset.json reused).
  • openant parse <repo> --fresh: deletes existing dataset.json, runs full reparse (longer).
  • openant parse <missing-repo> --fresh: doesn't crash when dataset.json doesn't pre-exist (no-op).
  • openant parse --help: --fresh listed; help text mentions "only deletes dataset.json; other artifacts in the output dir are preserved".
  • Race: two openant parse --fresh simultaneously on the same repo: no FileNotFoundError from racing os.remove (catch added).

@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Local test results

Built the Go CLI from this branch and exercised --fresh end-to-end on Windows using libs/openant-core/tests/fixtures/sample_python_repo.

Commands run:

go build -o openant.exe ./
./openant.exe parse <fixture> --output _out          # run 1: fresh output dir
./openant.exe parse <fixture> --output _out --fresh  # run 2: --fresh on existing dataset
./openant.exe parse <fixture> --output _empty --fresh  # run 3: --fresh with no pre-existing dataset

Outcome:

  • --fresh listed in parse --help with description "Delete existing dataset.json and reparse from scratch (other artifacts preserved)" ✅
  • Run with --fresh against existing dataset prints [Parser] --fresh: deleted existing dataset.json and rebuilds (mtime advanced) ✅
  • Run with --fresh and no pre-existing dataset.json: no crash, no "deleted" message, parse runs cleanly ✅
  • Race-condition catch and JS-parser hint: not exercised in this manual pass (covered by automated tests in the diff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant