Skip to content

feat: clean up batch JSONL files after processing (spec 550)#684

Merged
Muizzkolapo merged 1 commit into
mainfrom
fix/550-batch-jsonl-cleanup
Jun 14, 2026
Merged

feat: clean up batch JSONL files after processing (spec 550)#684
Muizzkolapo merged 1 commit into
mainfrom
fix/550-batch-jsonl-cleanup

Conversation

@Muizzkolapo

Copy link
Copy Markdown
Owner

Summary

  • Delete input JSONL files immediately after successful provider upload — the provider has the data, the local file is ephemeral
  • Delete result JSONL files immediately after parsing — results flow into memory then SQLite, the file is a write-through cache
  • Delete remaining .jsonl/.json batch artifacts in target/{action}/batch/ during --fresh cleanup
  • CLI batch retrieve output file is NOT deleted — that's the user-requested artifact

Why

Batch JSONL files accumulate across runs. A 50-action workflow running daily produces ~100 files/run. After 30 days: ~3000 orphaned files. No code reads these files after the initial upload/retrieval call completes.

Files changed (2 production, 2 test)

File Change
llm/providers/batch_base.py Delete input JSONL after _submit_to_provider_api, delete result JSONL after _read_jsonl_file
workflow/coordinator.py _clear_for_fresh_run globs and deletes .jsonl/.json in target/{action}/batch/
tests/unit/llm/test_batch_jsonl_cleanup.py 4 new tests: input cleanup, result cleanup, no-output-dir path
tests/unit/workflow/test_fresh_run_cleanup.py 5 new tests: JSONL deleted, JSON deleted, non-JSON preserved, no-dir safe, multi-action

Verification

  • pytest → 7488 passed, 2 skipped (+9 new tests)
  • ruff check → all checks passed
  • ruff format --check → 954 files already formatted

Batch JSONL files (input payloads and raw results) were written to disk
during provider upload/retrieval but never cleaned up, accumulating
across runs.

- Delete input JSONL immediately after successful provider upload
  (batch_base.py submit_batch)
- Delete result JSONL immediately after parsing results back into
  memory (batch_base.py retrieve_results)
- Delete remaining .jsonl/.json artifacts in target/{action}/batch/
  during --fresh cleanup (coordinator.py _clear_for_fresh_run)

The CLI `batch retrieve` command's output file is NOT deleted — that's
the user-requested artifact from the command.
@Muizzkolapo Muizzkolapo merged commit b623c6e into main Jun 14, 2026
5 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant