Skip to content

[dagster-dlt] resolve table names for dynamic tables that use functions#33666

Open
cmpadden wants to merge 1 commit intomasterfrom
colton/dlt-resolve-21090
Open

[dagster-dlt] resolve table names for dynamic tables that use functions#33666
cmpadden wants to merge 1 commit intomasterfrom
colton/dlt-resolve-21090

Conversation

@cmpadden
Copy link
Copy Markdown
Contributor

Summary & Motivation

Resolves #21090.

Fixes incorrect runtime metadata in dagster-dlt when a dlt.resource uses a callable table_name and materializes multiple realized tables.

DagsterDltResource.extract_resource_metadata() assumed each dlt resource mapped to a single physical table and derived that table name with str(resource.table_name). That breaks for resources defined like @dlt.resource(table_name=some_fn), because:

  • str(resource.table_name) becomes a function repr, not a real table name.
  • row counts, jobs, and schema lookups fail to match the actual tables dlt created.
  • materializations can end up with empty dagster/column_schema and no usable table metadata.

This was inconsistent with the translator path, which already falls back to resource.name for callable table_name.

This PR updates metadata extraction to use realized dlt schema information instead of assuming a single static table name.

  • Resolve root tables for a resource from schema.data_tables() using table["resource"] == resource.name.
  • Discover descendant tables by following parent relationships.
  • Aggregate root-table runtime metadata such as jobs and rows_loaded.
  • Only emit singular dagster/table_name and dagster/column_schema when there is exactly one realized root table.
  • For multi-table dynamic resources, expose per-table schema metadata instead of emitting misleading singular table metadata.

Test Plan

Introduced unit test.

Changelog

  • [dagster-dlt] resolved an issue where dynamic dlt table names resulted in improper asset names

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR fixes incorrect runtime metadata in dagster-dlt when a dlt.resource uses a callable table_name that routes rows to multiple physical tables. Previously, extract_resource_metadata called str(resource.table_name) which produced a function repr, causing row counts, job lookups, and column schema extraction to silently fail. The fix introduces two new private helpers—_resolve_root_table_names (queries schema.data_tables() by resource.name) and _resolve_child_table_names (BFS over parent relationships)—and threads them through extract_resource_metadata to handle both single-root (unchanged behaviour) and multi-root (aggregate rows_loaded, per-table schema keys, no misleading singular dagster/table_name or dagster/column_schema) cases.

Key changes:

  • _resolve_root_table_names returns realized table names from the dlt schema; falls back to the original resource.name-based lookup for callable table_name when the schema is empty.
  • rows_loaded is now summed across all root tables; jobs is filtered by membership in root_table_names.
  • For multi-root resources, per-table schema metadata is exposed as top-level metadata keys instead of the single dagster/column_schema key.
  • New unit test exercises the full dynamic-table-name path and checks absence of the previously misleading singular metadata keys.

Confidence Score: 4/5

  • PR is safe to merge; the core fix is correct and well-tested, with a few minor defensive and coverage improvements remaining.
  • The logic correctly handles both single-root (backward-compatible) and multi-root (new) cases. The three open suggestions from previous review threads—filtering child tables in _resolve_root_table_names, the table_name loop-variable shadow, and missing beta/beta__values test assertions—are all non-blocking P2s. A new minor style issue (pop(0) vs deque.popleft()) is also P2. No correctness or data-loss bugs were found.
  • No files require special attention beyond the minor suggestions noted above.

Important Files Changed

Filename Overview
python_modules/libraries/dagster-dlt/dagster_dlt/resource.py Core fix looks correct: resolves callable table_name by querying schema.data_tables() with resource name matching; handles single vs multi-root resources properly; rows_loaded and jobs are aggregated across all root tables. Minor: _resolve_child_table_names uses list.pop(0) (O(n)) instead of deque.popleft(); _resolve_root_table_names does not guard against child tables that may carry resource== resource.name (defensive parent check missing).
python_modules/libraries/dagster-dlt/dagster_dlt_tests/test_asset_decorator.py New test correctly exercises the dynamic table_name path and verifies that dagster/table_name and dagster/column_schema are absent for multi-root resources. Only alpha and alpha__values are asserted; beta and beta__values assertions are missing, leaving the second root table path untested.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[extract_resource_metadata called] --> B[_resolve_root_table_names]
    B --> C{Any table in schema\nwith resource==resource.name?}
    C -- Yes --> D[Return matching table names\ne.g. alpha, beta]
    C -- No --> E{table_name callable?}
    E -- Yes --> F[Fall back to resource.name]
    E -- No --> G[Normalize str table_name]
    F --> H[Return normalized fallback name]
    G --> H
    D --> I{len root_table_names == 1?}
    H --> I
    I -- Yes, single root --> J[primary_table_name = root_table_names 0]
    I -- No, multi-root --> K[primary_table_name = None]
    J --> L[supplemental = child tables only]
    K --> M[supplemental = root tables + child tables]
    L --> N[Emit dagster/column_schema and\ndagster/table_name for primary]
    M --> O[No dagster/column_schema or\ndagster/table_name emitted]
    N --> P[Emit per-child table schema keys]
    O --> Q[Emit per-root and per-child table schema keys]
    P --> R[Aggregate rows_loaded across root tables]
    Q --> R
    R --> S[Filter jobs by root_table_names membership]
    S --> T[Return base_metadata]
Loading

Reviews (3): Last reviewed commit: "[dagster-dlt] resolve table names for dy..." | Re-trigger Greptile

@cmpadden
Copy link
Copy Markdown
Contributor Author

Followed up on the Greptile notes.

Addressed:

  • Added a defensive not table.get("parent") guard when resolving root tables in DagsterDltResource._resolve_root_table_names.
  • Renamed the qualified destination-path local for readability.
  • Expanded the regression test to assert the beta / beta__values side of the multi-root dynamic-table case as well.

Validation:

  • make ruff passes.
  • python -m pytest python_modules/libraries/dagster-dlt/dagster_dlt_tests/test_asset_decorator.py -q passes.
  • A targeted pyright run on the changed files passes: pyright python_modules/libraries/dagster-dlt/dagster_dlt/resource.py python_modules/libraries/dagster-dlt/dagster_dlt_tests/test_asset_decorator.py.

make pyright is still failing locally, but the remaining errors are unrelated pre-existing repo-wide issues, plus a local pyright version mismatch (1.1.379 installed vs 1.1.389 pinned by the repo).

@cmpadden cmpadden force-pushed the colton/dlt-resolve-21090 branch from 7e400e7 to 39ecb1e Compare March 24, 2026 19:42
Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@cmpadden
Copy link
Copy Markdown
Contributor Author

@greptileai re-review.

@cmpadden cmpadden force-pushed the colton/dlt-resolve-21090 branch 3 times, most recently from a371610 to 54ebe04 Compare March 25, 2026 14:31
@cmpadden
Copy link
Copy Markdown
Contributor Author

@greptileai re-review.

@cmpadden cmpadden force-pushed the colton/dlt-resolve-21090 branch from 54ebe04 to 0f5924e Compare March 31, 2026 01:29
@cmpadden cmpadden force-pushed the colton/dlt-resolve-21090 branch from 0f5924e to 42a3e2c Compare March 31, 2026 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[dagster-embedded-elt][dlt] One dlt asset produces several tables, making dbt assets disconnected from the dlt asset

1 participant