Skip to content

feat(assets): register output files as assets after prompt execution#12812

Open
luke-mino-altherr wants to merge 1 commit intomasterfrom
luke-mino-altherr/register-output-assets
Open

feat(assets): register output files as assets after prompt execution#12812
luke-mino-altherr wants to merge 1 commit intomasterfrom
luke-mino-altherr/register-output-assets

Conversation

@luke-mino-altherr
Copy link
Contributor

@luke-mino-altherr luke-mino-altherr commented Mar 6, 2026

Summary

Register output files as assets immediately after prompt execution, replacing the need for a filesystem sweep to discover new outputs.

Changes

  • app/assets/services/ingest.py — Add ingest_existing_file(), a public wrapper that handles stat, BLAKE3 hash, MIME detection, path-based tag derivation, and _ingest_file_from_path. Add register_output_files() to batch-register a list of absolute paths as output assets with provenance metadata.
  • app/assets/services/__init__.py — Export ingest_existing_file and register_output_files.
  • app/assets/seeder.py — Add start_enrich() to trigger background hash computation. Add _reset_to_idle() helper to deduplicate state reset logic. Merge pending enrich requests (union roots, sticky-true compute_hashes) so no queued work is silently dropped.
  • main.py — Add _collect_output_absolute_paths() to parse history_result['outputs'] and collect file paths. Call register_output_files() after each prompt execution, then trigger start_enrich() for background hashing.
  • tests/test_asset_seeder.py — Test coverage for enqueue/merge/drain behavior and thread safety of the enrich flow.

How it works

After each prompt execution, the executor's history_result['outputs'] contains a dict of node_id → ui_data with every file each output node wrote. We parse that structure and register each file as an asset — no filesystem sweep needed.

  • Only registers files with type == "output" (skips temp/preview files)
  • Stores prompt_id in user_metadata for provenance tracking
  • Runs while the asset seeder is paused, gated behind asset_seeder.is_disabled()
  • Two-phase ingest: inserts a stub record (hash=NULL) for instant visibility, then defers BLAKE3 hashing to the background seeder enrich phase
  • Deduplicates by BLAKE3 hash (same content → same Asset, new AssetReference)
  • Derives tags automatically from file path

@luke-mino-altherr luke-mino-altherr force-pushed the luke-mino-altherr/register-output-assets branch 3 times, most recently from b27f861 to b9599b9 Compare March 10, 2026 00:06
Add ingest_existing_file() to services/ingest.py as a public wrapper for
registering on-disk files (stat, BLAKE3 hash, MIME detection, path-based
tag derivation).

After each prompt execution in the main loop, iterate
history_result['outputs'] and register files with type 'output' as
assets. Runs while the asset seeder is paused, gated behind
asset_seeder.is_disabled(). Stores prompt_id in user_metadata for
provenance tracking.

Ingest uses a two-phase approach: insert a stub record (hash=NULL) first
for instant visibility, then defer hashing to the background seeder
enrich phase to avoid blocking the prompt worker thread.

When multiple enrich scans are enqueued while the seeder is busy, roots
are now unioned and compute_hashes uses sticky-true (OR) logic so no
queued work is silently dropped.

Extract _reset_to_idle helper in the asset seeder to deduplicate the
state reset pattern shared by _run_scan and mark_missing_outside_prefixes.

Separate history parsing from output file registration: move generic
file registration logic into register_output_files() in
app/assets/services/ingest.py, keeping only the ComfyUI history format
parsing (_collect_output_absolute_paths) in main.py.
@luke-mino-altherr luke-mino-altherr force-pushed the luke-mino-altherr/register-output-assets branch from b9599b9 to 42eda2b Compare March 10, 2026 00:09
@luke-mino-altherr luke-mino-altherr marked this pull request as ready for review March 10, 2026 00:10
@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

This pull request enhances the asset seeding system with a pending enrichment queue mechanism and extends file ingestion capabilities. The seeder now queues enrichment requests when busy and automatically processes them after scan completion. New helper functions enable registration of existing files and bulk output file processing. A helper function collects output file paths from execution history, registers them, and triggers enrichment with hash computation. Comprehensive tests validate state reset, enqueue behavior, pending queue merging, and thread-safety of the new functionality.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: registering output files as assets after prompt execution, which aligns with the core objective across multiple files.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering the key files modified, implementation approach, and how the new feature works.
Docstring Coverage ✅ Passed Docstring coverage is 80.56% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/assets/services/ingest.py`:
- Around line 153-184: ingest_existing_file currently ignores the user_metadata
parameter (it sets spec["metadata"]=None) which drops prompt_id; update
ingest_existing_file so spec["metadata"] preserves the passed user_metadata
(convert it to the expected ExtractedMetadata form if needed) before calling
batch_insert_seed_assets: i.e., set spec["metadata"] = user_metadata if it's
already the right type, otherwise call the conversion method (e.g.,
user_metadata.to_user_metadata() or equivalent) so batch_insert_seed_assets can
extract prompt_id correctly. Ensure you reference the ingest_existing_file
function, the spec["metadata"] field, and
batch_insert_seed_assets/.to_user_metadata() handling when making the change.

In `@main.py`:
- Around line 234-249: The function _collect_output_absolute_paths should guard
against folder_paths.get_directory_by_type("output") returning None: capture
base_dir = folder_paths.get_directory_by_type("output") and if base_dir is None
either raise a clear exception (e.g., ValueError) or return an empty list to
avoid calling os.path.join(None, ...); update the logic in
_collect_output_absolute_paths to check base_dir before the loop and use
base_dir with os.path.join only when non-None so filenames are not turned into
"None/..." paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50c7b5df-d9de-4f07-8ab5-1c92f13bc666

📥 Commits

Reviewing files that changed from the base of the PR and between 06f85e2 and 42eda2b.

📒 Files selected for processing (5)
  • app/assets/seeder.py
  • app/assets/services/__init__.py
  • app/assets/services/ingest.py
  • main.py
  • tests/test_asset_seeder.py

Comment on lines +153 to +184
def ingest_existing_file(
abs_path: str,
user_metadata: UserMetadata = None,
extra_tags: Sequence[str] = (),
owner_id: str = "",
) -> None:
"""Register an existing on-disk file as an asset stub.

Inserts a stub record (hash=NULL) for immediate UX visibility.
The caller is responsible for triggering background enrichment
(hash computation, metadata extraction) via the asset seeder.
"""
size_bytes, mtime_ns = get_size_and_mtime_ns(abs_path)
mime_type = mimetypes.guess_type(abs_path, strict=False)[0]
name, path_tags = get_name_and_tags_from_asset_path(abs_path)
tags = list(dict.fromkeys(path_tags + list(extra_tags)))

spec = {
"abs_path": abs_path,
"size_bytes": size_bytes,
"mtime_ns": mtime_ns,
"info_name": name,
"tags": tags,
"fname": os.path.basename(abs_path),
"metadata": None,
"hash": None,
"mime_type": mime_type,
}
with create_session() as session:
batch_insert_seed_assets(session, [spec], owner_id=owner_id)
session.commit()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

Script executed:

#!/bin/bash
# Check how batch_insert_seed_assets uses the metadata field from specs
rg -n -A 10 'def batch_insert_seed_assets' app/assets/services/bulk_ingest.py

# Check if metadata is extracted from specs
rg -n '"metadata"' app/assets/services/bulk_ingest.py

Repository: Comfy-Org/ComfyUI


Repository: Comfy-Org/ComfyUI
Exit code: 0

stdout:

98:def batch_insert_seed_assets(
99-    session: Session,
100-    specs: list[SeedAssetSpec],
101-    owner_id: str = "",
102-) -> BulkInsertResult:
103-    """Seed assets from filesystem specs in batch.
104-
105-    Each spec is a dict with keys:
106-      - abs_path: str
107-      - size_bytes: int
108-      - mtime_ns: int
152:        extracted_metadata = spec.get("metadata")

Script executed:

#!/bin/bash
# Get more context about what happens with extracted_metadata
rg -n -A 20 'extracted_metadata = spec.get' app/assets/services/bulk_ingest.py

# Also check the full function signature and docstring of batch_insert_seed_assets
rg -n -B 5 -A 30 'def batch_insert_seed_assets' app/assets/services/bulk_ingest.py | head -60

# Check if metadata field was part of the original SeedAssetSpec or is new
rg -n 'class SeedAssetSpec' app/assets/services/bulk_ingest.py
rg -n -A 15 'class SeedAssetSpec' app/assets/services/bulk_ingest.py

Repository: Comfy-Org/ComfyUI


Repository: Comfy-Org/ComfyUI
Exit code: 0

stdout:

152:        extracted_metadata = spec.get("metadata")
153-        if extracted_metadata:
154-            user_metadata: dict[str, Any] | None = extracted_metadata.to_user_metadata()
155-        elif spec["fname"]:
156-            user_metadata = {"filename": spec["fname"]}
157-        else:
158-            user_metadata = None
159-
160-        reference_rows.append(
161-            {
162-                "id": reference_id,
163-                "asset_id": asset_id,
164-                "file_path": absolute_path,
165-                "mtime_ns": spec["mtime_ns"],
166-                "owner_id": owner_id,
167-                "name": spec["info_name"],
168-                "preview_id": None,
169-                "user_metadata": user_metadata,
170-                "created_at": current_time,
171-                "updated_at": current_time,
172-                "last_access_time": current_time,
93-    inserted_refs: int
94-    won_paths: int
95-    lost_paths: int
96-
97-
98:def batch_insert_seed_assets(
99-    session: Session,
100-    specs: list[SeedAssetSpec],
101-    owner_id: str = "",
102-) -> BulkInsertResult:
103-    """Seed assets from filesystem specs in batch.
104-
105-    Each spec is a dict with keys:
106-      - abs_path: str
107-      - size_bytes: int
108-      - mtime_ns: int
109-      - info_name: str
110-      - tags: list[str]
111-      - fname: Optional[str]
112-
113-    This function orchestrates:
114-    1. Insert seed Assets (hash=NULL)
115-    2. Claim references with ON CONFLICT DO NOTHING on file_path
116-    3. Query to find winners (paths where our asset_id was inserted)
117-    4. Delete Assets for losers (path already claimed by another asset)
118-    5. Insert tags and metadata for successfully inserted references
119-
120-    Returns:
121-        BulkInsertResult with inserted_refs, won_paths, lost_paths
122-    """
123-    if not specs:
124-        return BulkInsertResult(inserted_refs=0, won_paths=0, lost_paths=0)
125-
126-    current_time = get_utc_now()
127-    asset_rows: list[AssetRow] = []
128-    reference_rows: list[ReferenceRow] = []
28:class SeedAssetSpec(TypedDict):
28:class SeedAssetSpec(TypedDict):
29-    """Spec for seeding an asset from filesystem."""
30-
31-    abs_path: str
32-    size_bytes: int
33-    mtime_ns: int
34-    info_name: str
35-    tags: list[str]
36-    fname: str
37-    metadata: ExtractedMetadata | None
38-    hash: str | None
39-    mime_type: str | None
40-
41-
42-class AssetRow(TypedDict):
43-    """Row data for inserting an Asset."""

user_metadata parameter is accepted but never used, causing prompt_id data to be discarded.

The ingest_existing_file function accepts user_metadata as a parameter (line 155) which is passed from register_output_files with prompt_id, but the spec dict unconditionally sets "metadata": None (line 177). This causes the user_metadata to be silently discarded instead of being preserved through to batch_insert_seed_assets, which extracts it at line 152 of bulk_ingest.py.

The batch_insert_seed_assets function handles the metadata field by calling .to_user_metadata() if present, or falling back to the filename. The current implementation bypasses this and loses the prompt_id information that should be stored.

Note: The spec's metadata field expects ExtractedMetadata | None type, so ensure the user_metadata integration properly handles any type conversion needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/assets/services/ingest.py` around lines 153 - 184, ingest_existing_file
currently ignores the user_metadata parameter (it sets spec["metadata"]=None)
which drops prompt_id; update ingest_existing_file so spec["metadata"] preserves
the passed user_metadata (convert it to the expected ExtractedMetadata form if
needed) before calling batch_insert_seed_assets: i.e., set spec["metadata"] =
user_metadata if it's already the right type, otherwise call the conversion
method (e.g., user_metadata.to_user_metadata() or equivalent) so
batch_insert_seed_assets can extract prompt_id correctly. Ensure you reference
the ingest_existing_file function, the spec["metadata"] field, and
batch_insert_seed_assets/.to_user_metadata() handling when making the change.

Comment on lines +234 to +249
def _collect_output_absolute_paths(history_result: dict) -> list[str]:
"""Extract absolute file paths for output items from a history result."""
paths = []
base_dir = folder_paths.get_directory_by_type("output")
for node_output in history_result.get("outputs", {}).values():
for items in node_output.values():
if not isinstance(items, list):
continue
for item in items:
if not isinstance(item, dict) or item.get("type") != "output":
continue
filename = item.get("filename")
if not filename:
continue
paths.append(os.path.join(base_dir, item.get("subfolder", ""), filename))
return paths
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider handling potential None from get_directory_by_type.

folder_paths.get_directory_by_type("output") returns str | None. While "output" should always resolve to a valid directory, if it ever returns None, os.path.join(None, ...) produces a path like "None/subfolder/filename", which would silently fail later during file registration.

🛡️ Defensive fix suggestion
 def _collect_output_absolute_paths(history_result: dict) -> list[str]:
     """Extract absolute file paths for output items from a history result."""
     paths = []
     base_dir = folder_paths.get_directory_by_type("output")
+    if base_dir is None:
+        return paths
     for node_output in history_result.get("outputs", {}).values():
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 234 - 249, The function _collect_output_absolute_paths
should guard against folder_paths.get_directory_by_type("output") returning
None: capture base_dir = folder_paths.get_directory_by_type("output") and if
base_dir is None either raise a clear exception (e.g., ValueError) or return an
empty list to avoid calling os.path.join(None, ...); update the logic in
_collect_output_absolute_paths to check base_dir before the loop and use
base_dir with os.path.join only when non-None so filenames are not turned into
"None/..." paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant