feat(storage): move target_data source of truth from SQLite to filesystem#677
Merged
Muizzkolapo merged 5 commits intoJun 9, 2026
Merged
Conversation
…stem The target_data table stored complete output JSON arrays as TEXT blobs, causing 1GB+ databases for content-heavy workflows. The same data was already written to agent_io/target/ on every call — the DB copy was redundant. This change makes the filesystem the source of truth: - write_target writes data to filesystem, stores only metadata in DB - _read_target_raw reads from filesystem instead of DB - preview_target reads from filesystem - --fresh now cleans up target JSON files alongside DB rows - data_scanners and smoke tests read from filesystem The DB data column now contains "[]" (satisfies NOT NULL, eliminates blob storage). No schema migration needed.
- Remove is_file() pre-checks before read_text() in _read_target_raw, preview_target, and data_scanners — handle errors directly - Remove unused ensure_directory_exists import from writer.py
…ir, dead code - Add assert_path_contained in write_target and _read_target_raw to prevent path traversal via absolute relative_path - Fail loudly (ValueError) in write_target when target_dir is None instead of silently skipping the filesystem write - Change preview_target target_dir=None from continue to break - Remove 18 dead Path(tmpdir) expressions in integration tests - Add comment explaining intentional double-write in FileWriter - Add target_dir to test_delete_target.py fixtures
Replace silent break with early ValueError, matching write_target and _read_target_raw. Remove now-unreachable None check in loop.
- Remove redundant atomic_json_write from FileWriter — backend now owns
the filesystem write exclusively. Byte count computed from serialized
data instead of stat.
- Change --fresh cleanup from glob("*.json") to rglob("*.json") with
batch/ exclusion, so nested target files are properly cleaned.
- Align test output_directory with backend's target_dir/action so
file_path and backend write path are always consistent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
datacolumn to filesystem (agent_io/target/)datacolumn written as"[]"to satisfy NOT NULL without schema migration_read_target_raw,preview_target, data scanners, smoke tests) switched to filesystem reads--freshnow deletes target JSON files alongside DB rowsVersionOutputCorrelatorbypass path now routes throughwrite_targetfor filesystem writesVerification
ruff checkandruff format --checkclean