Skip to content

feat(storage): move target_data source of truth from SQLite to filesystem#677

Merged
Muizzkolapo merged 5 commits into
mainfrom
feat/540-target-data-filesystem-source-of-truth
Jun 9, 2026
Merged

feat(storage): move target_data source of truth from SQLite to filesystem#677
Muizzkolapo merged 5 commits into
mainfrom
feat/540-target-data-filesystem-source-of-truth

Conversation

@Muizzkolapo

Copy link
Copy Markdown
Owner

Summary

  • Move target_data source of truth from SQLite data column to filesystem (agent_io/target/)
  • DB now stores only metadata (action_name, relative_path, record_count, timestamps) — data column written as "[]" to satisfy NOT NULL without schema migration
  • All readers (_read_target_raw, preview_target, data scanners, smoke tests) switched to filesystem reads
  • --fresh now deletes target JSON files alongside DB rows
  • Write order swapped: filesystem first, DB metadata second (DB row = commit signal)
  • VersionOutputCorrelator bypass path now routes through write_target for filesystem writes

Verification

  • 7455 tests pass, 2 skipped
  • ruff check and ruff format --check clean
  • 12 files changed across production code, tests, and smoke tests

…stem

The target_data table stored complete output JSON arrays as TEXT blobs,
causing 1GB+ databases for content-heavy workflows. The same data was
already written to agent_io/target/ on every call — the DB copy was
redundant.

This change makes the filesystem the source of truth:
- write_target writes data to filesystem, stores only metadata in DB
- _read_target_raw reads from filesystem instead of DB
- preview_target reads from filesystem
- --fresh now cleans up target JSON files alongside DB rows
- data_scanners and smoke tests read from filesystem

The DB data column now contains "[]" (satisfies NOT NULL, eliminates
blob storage). No schema migration needed.
- Remove is_file() pre-checks before read_text() in _read_target_raw,
  preview_target, and data_scanners — handle errors directly
- Remove unused ensure_directory_exists import from writer.py
…ir, dead code

- Add assert_path_contained in write_target and _read_target_raw to
  prevent path traversal via absolute relative_path
- Fail loudly (ValueError) in write_target when target_dir is None
  instead of silently skipping the filesystem write
- Change preview_target target_dir=None from continue to break
- Remove 18 dead Path(tmpdir) expressions in integration tests
- Add comment explaining intentional double-write in FileWriter
- Add target_dir to test_delete_target.py fixtures
Replace silent break with early ValueError, matching write_target
and _read_target_raw. Remove now-unreachable None check in loop.
- Remove redundant atomic_json_write from FileWriter — backend now owns
  the filesystem write exclusively. Byte count computed from serialized
  data instead of stat.
- Change --fresh cleanup from glob("*.json") to rglob("*.json") with
  batch/ exclusion, so nested target files are properly cleaned.
- Align test output_directory with backend's target_dir/action so
  file_path and backend write path are always consistent.
@Muizzkolapo Muizzkolapo merged commit 52ed783 into main Jun 9, 2026
5 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant