feat(assets): register output files as assets after prompt execution#12812
feat(assets): register output files as assets after prompt execution#12812luke-mino-altherr wants to merge 1 commit intomasterfrom
Conversation
b27f861 to
b9599b9
Compare
Add ingest_existing_file() to services/ingest.py as a public wrapper for registering on-disk files (stat, BLAKE3 hash, MIME detection, path-based tag derivation). After each prompt execution in the main loop, iterate history_result['outputs'] and register files with type 'output' as assets. Runs while the asset seeder is paused, gated behind asset_seeder.is_disabled(). Stores prompt_id in user_metadata for provenance tracking. Ingest uses a two-phase approach: insert a stub record (hash=NULL) first for instant visibility, then defer hashing to the background seeder enrich phase to avoid blocking the prompt worker thread. When multiple enrich scans are enqueued while the seeder is busy, roots are now unioned and compute_hashes uses sticky-true (OR) logic so no queued work is silently dropped. Extract _reset_to_idle helper in the asset seeder to deduplicate the state reset pattern shared by _run_scan and mark_missing_outside_prefixes. Separate history parsing from output file registration: move generic file registration logic into register_output_files() in app/assets/services/ingest.py, keeping only the ComfyUI history format parsing (_collect_output_absolute_paths) in main.py.
b9599b9 to
42eda2b
Compare
📝 WalkthroughWalkthroughThis pull request enhances the asset seeding system with a pending enrichment queue mechanism and extends file ingestion capabilities. The seeder now queues enrichment requests when busy and automatically processes them after scan completion. New helper functions enable registration of existing files and bulk output file processing. A helper function collects output file paths from execution history, registers them, and triggers enrichment with hash computation. Comprehensive tests validate state reset, enqueue behavior, pending queue merging, and thread-safety of the new functionality. 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/assets/services/ingest.py`:
- Around line 153-184: ingest_existing_file currently ignores the user_metadata
parameter (it sets spec["metadata"]=None) which drops prompt_id; update
ingest_existing_file so spec["metadata"] preserves the passed user_metadata
(convert it to the expected ExtractedMetadata form if needed) before calling
batch_insert_seed_assets: i.e., set spec["metadata"] = user_metadata if it's
already the right type, otherwise call the conversion method (e.g.,
user_metadata.to_user_metadata() or equivalent) so batch_insert_seed_assets can
extract prompt_id correctly. Ensure you reference the ingest_existing_file
function, the spec["metadata"] field, and
batch_insert_seed_assets/.to_user_metadata() handling when making the change.
In `@main.py`:
- Around line 234-249: The function _collect_output_absolute_paths should guard
against folder_paths.get_directory_by_type("output") returning None: capture
base_dir = folder_paths.get_directory_by_type("output") and if base_dir is None
either raise a clear exception (e.g., ValueError) or return an empty list to
avoid calling os.path.join(None, ...); update the logic in
_collect_output_absolute_paths to check base_dir before the loop and use
base_dir with os.path.join only when non-None so filenames are not turned into
"None/..." paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 50c7b5df-d9de-4f07-8ab5-1c92f13bc666
📒 Files selected for processing (5)
app/assets/seeder.pyapp/assets/services/__init__.pyapp/assets/services/ingest.pymain.pytests/test_asset_seeder.py
| def ingest_existing_file( | ||
| abs_path: str, | ||
| user_metadata: UserMetadata = None, | ||
| extra_tags: Sequence[str] = (), | ||
| owner_id: str = "", | ||
| ) -> None: | ||
| """Register an existing on-disk file as an asset stub. | ||
|
|
||
| Inserts a stub record (hash=NULL) for immediate UX visibility. | ||
| The caller is responsible for triggering background enrichment | ||
| (hash computation, metadata extraction) via the asset seeder. | ||
| """ | ||
| size_bytes, mtime_ns = get_size_and_mtime_ns(abs_path) | ||
| mime_type = mimetypes.guess_type(abs_path, strict=False)[0] | ||
| name, path_tags = get_name_and_tags_from_asset_path(abs_path) | ||
| tags = list(dict.fromkeys(path_tags + list(extra_tags))) | ||
|
|
||
| spec = { | ||
| "abs_path": abs_path, | ||
| "size_bytes": size_bytes, | ||
| "mtime_ns": mtime_ns, | ||
| "info_name": name, | ||
| "tags": tags, | ||
| "fname": os.path.basename(abs_path), | ||
| "metadata": None, | ||
| "hash": None, | ||
| "mime_type": mime_type, | ||
| } | ||
| with create_session() as session: | ||
| batch_insert_seed_assets(session, [spec], owner_id=owner_id) | ||
| session.commit() | ||
|
|
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
#!/bin/bash
# Check how batch_insert_seed_assets uses the metadata field from specs
rg -n -A 10 'def batch_insert_seed_assets' app/assets/services/bulk_ingest.py
# Check if metadata is extracted from specs
rg -n '"metadata"' app/assets/services/bulk_ingest.pyRepository: Comfy-Org/ComfyUI
Repository: Comfy-Org/ComfyUI
Exit code: 0
stdout:
98:def batch_insert_seed_assets(
99- session: Session,
100- specs: list[SeedAssetSpec],
101- owner_id: str = "",
102-) -> BulkInsertResult:
103- """Seed assets from filesystem specs in batch.
104-
105- Each spec is a dict with keys:
106- - abs_path: str
107- - size_bytes: int
108- - mtime_ns: int
152: extracted_metadata = spec.get("metadata")
Script executed:
#!/bin/bash
# Get more context about what happens with extracted_metadata
rg -n -A 20 'extracted_metadata = spec.get' app/assets/services/bulk_ingest.py
# Also check the full function signature and docstring of batch_insert_seed_assets
rg -n -B 5 -A 30 'def batch_insert_seed_assets' app/assets/services/bulk_ingest.py | head -60
# Check if metadata field was part of the original SeedAssetSpec or is new
rg -n 'class SeedAssetSpec' app/assets/services/bulk_ingest.py
rg -n -A 15 'class SeedAssetSpec' app/assets/services/bulk_ingest.pyRepository: Comfy-Org/ComfyUI
Repository: Comfy-Org/ComfyUI
Exit code: 0
stdout:
152: extracted_metadata = spec.get("metadata")
153- if extracted_metadata:
154- user_metadata: dict[str, Any] | None = extracted_metadata.to_user_metadata()
155- elif spec["fname"]:
156- user_metadata = {"filename": spec["fname"]}
157- else:
158- user_metadata = None
159-
160- reference_rows.append(
161- {
162- "id": reference_id,
163- "asset_id": asset_id,
164- "file_path": absolute_path,
165- "mtime_ns": spec["mtime_ns"],
166- "owner_id": owner_id,
167- "name": spec["info_name"],
168- "preview_id": None,
169- "user_metadata": user_metadata,
170- "created_at": current_time,
171- "updated_at": current_time,
172- "last_access_time": current_time,
93- inserted_refs: int
94- won_paths: int
95- lost_paths: int
96-
97-
98:def batch_insert_seed_assets(
99- session: Session,
100- specs: list[SeedAssetSpec],
101- owner_id: str = "",
102-) -> BulkInsertResult:
103- """Seed assets from filesystem specs in batch.
104-
105- Each spec is a dict with keys:
106- - abs_path: str
107- - size_bytes: int
108- - mtime_ns: int
109- - info_name: str
110- - tags: list[str]
111- - fname: Optional[str]
112-
113- This function orchestrates:
114- 1. Insert seed Assets (hash=NULL)
115- 2. Claim references with ON CONFLICT DO NOTHING on file_path
116- 3. Query to find winners (paths where our asset_id was inserted)
117- 4. Delete Assets for losers (path already claimed by another asset)
118- 5. Insert tags and metadata for successfully inserted references
119-
120- Returns:
121- BulkInsertResult with inserted_refs, won_paths, lost_paths
122- """
123- if not specs:
124- return BulkInsertResult(inserted_refs=0, won_paths=0, lost_paths=0)
125-
126- current_time = get_utc_now()
127- asset_rows: list[AssetRow] = []
128- reference_rows: list[ReferenceRow] = []
28:class SeedAssetSpec(TypedDict):
28:class SeedAssetSpec(TypedDict):
29- """Spec for seeding an asset from filesystem."""
30-
31- abs_path: str
32- size_bytes: int
33- mtime_ns: int
34- info_name: str
35- tags: list[str]
36- fname: str
37- metadata: ExtractedMetadata | None
38- hash: str | None
39- mime_type: str | None
40-
41-
42-class AssetRow(TypedDict):
43- """Row data for inserting an Asset."""
user_metadata parameter is accepted but never used, causing prompt_id data to be discarded.
The ingest_existing_file function accepts user_metadata as a parameter (line 155) which is passed from register_output_files with prompt_id, but the spec dict unconditionally sets "metadata": None (line 177). This causes the user_metadata to be silently discarded instead of being preserved through to batch_insert_seed_assets, which extracts it at line 152 of bulk_ingest.py.
The batch_insert_seed_assets function handles the metadata field by calling .to_user_metadata() if present, or falling back to the filename. The current implementation bypasses this and loses the prompt_id information that should be stored.
Note: The spec's metadata field expects ExtractedMetadata | None type, so ensure the user_metadata integration properly handles any type conversion needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/assets/services/ingest.py` around lines 153 - 184, ingest_existing_file
currently ignores the user_metadata parameter (it sets spec["metadata"]=None)
which drops prompt_id; update ingest_existing_file so spec["metadata"] preserves
the passed user_metadata (convert it to the expected ExtractedMetadata form if
needed) before calling batch_insert_seed_assets: i.e., set spec["metadata"] =
user_metadata if it's already the right type, otherwise call the conversion
method (e.g., user_metadata.to_user_metadata() or equivalent) so
batch_insert_seed_assets can extract prompt_id correctly. Ensure you reference
the ingest_existing_file function, the spec["metadata"] field, and
batch_insert_seed_assets/.to_user_metadata() handling when making the change.
| def _collect_output_absolute_paths(history_result: dict) -> list[str]: | ||
| """Extract absolute file paths for output items from a history result.""" | ||
| paths = [] | ||
| base_dir = folder_paths.get_directory_by_type("output") | ||
| for node_output in history_result.get("outputs", {}).values(): | ||
| for items in node_output.values(): | ||
| if not isinstance(items, list): | ||
| continue | ||
| for item in items: | ||
| if not isinstance(item, dict) or item.get("type") != "output": | ||
| continue | ||
| filename = item.get("filename") | ||
| if not filename: | ||
| continue | ||
| paths.append(os.path.join(base_dir, item.get("subfolder", ""), filename)) | ||
| return paths |
There was a problem hiding this comment.
Consider handling potential None from get_directory_by_type.
folder_paths.get_directory_by_type("output") returns str | None. While "output" should always resolve to a valid directory, if it ever returns None, os.path.join(None, ...) produces a path like "None/subfolder/filename", which would silently fail later during file registration.
🛡️ Defensive fix suggestion
def _collect_output_absolute_paths(history_result: dict) -> list[str]:
"""Extract absolute file paths for output items from a history result."""
paths = []
base_dir = folder_paths.get_directory_by_type("output")
+ if base_dir is None:
+ return paths
for node_output in history_result.get("outputs", {}).values():🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@main.py` around lines 234 - 249, The function _collect_output_absolute_paths
should guard against folder_paths.get_directory_by_type("output") returning
None: capture base_dir = folder_paths.get_directory_by_type("output") and if
base_dir is None either raise a clear exception (e.g., ValueError) or return an
empty list to avoid calling os.path.join(None, ...); update the logic in
_collect_output_absolute_paths to check base_dir before the loop and use
base_dir with os.path.join only when non-None so filenames are not turned into
"None/..." paths.
Summary
Register output files as assets immediately after prompt execution, replacing the need for a filesystem sweep to discover new outputs.
Changes
app/assets/services/ingest.py— Addingest_existing_file(), a public wrapper that handles stat, BLAKE3 hash, MIME detection, path-based tag derivation, and_ingest_file_from_path. Addregister_output_files()to batch-register a list of absolute paths as output assets with provenance metadata.app/assets/services/__init__.py— Exportingest_existing_fileandregister_output_files.app/assets/seeder.py— Addstart_enrich()to trigger background hash computation. Add_reset_to_idle()helper to deduplicate state reset logic. Merge pending enrich requests (union roots, sticky-truecompute_hashes) so no queued work is silently dropped.main.py— Add_collect_output_absolute_paths()to parsehistory_result['outputs']and collect file paths. Callregister_output_files()after each prompt execution, then triggerstart_enrich()for background hashing.tests/test_asset_seeder.py— Test coverage for enqueue/merge/drain behavior and thread safety of the enrich flow.How it works
After each prompt execution, the executor's
history_result['outputs']contains a dict ofnode_id → ui_datawith every file each output node wrote. We parse that structure and register each file as an asset — no filesystem sweep needed.type == "output"(skips temp/preview files)prompt_idinuser_metadatafor provenance trackingasset_seeder.is_disabled()