Skip to content

Conversation

@tripleaceme
Copy link

@tripleaceme tripleaceme commented Jan 28, 2026

Description

This PR implements the feature requested in issue #10476 to automatically retrieve column descriptions from the database for sources during dbt docs generate, using them as fallback when YAML documentation is unavailable.

Problem

Previously, dbt required manual maintenance of column descriptions in YAML files for source tables. If database columns had comments/descriptions but weren't documented in YAML, those database comments would not appear in the generated documentation. This created unnecessary duplication of effort, especially in organizations where multiple teams manage different data sources.

Solution

This enhancement allows dbt docs generate to automatically use database column comments as fallback descriptions when YAML documentation is empty for source columns.

Priority Order:

  1. YAML description (if present) - Takes priority, allowing users to override database comments
  2. Database comment (if YAML is empty) - Used as fallback

Changes

Core Implementation

  • File: core/dbt/task/docs/generate.py
  • Added _enrich_source_columns_with_descriptions() method to GenerateTask
  • Enriches source catalog entries after make_unique_id_map()
  • Uses dataclasses.replace() to update column metadata immutably
  • No schema changes required (uses existing comment field)

Testing

  • File: tests/functional/docs/test_generate.py
  • New test class: TestGenerateSourceColumnDescriptions
  • Validates all scenarios:
    • ✅ YAML descriptions take priority over DB comments
    • ✅ DB comments used as fallback when YAML is empty
    • ✅ DB comments used for undocumented columns

Benefits

  1. Reduced Maintenance: No need to duplicate database comments in YAML files
  2. Automatic Documentation: Leverage existing database metadata
  3. Override Capability: YAML descriptions still take priority when provided
  4. No Breaking Changes: Purely additive feature
  5. Cross-Team Friendly: Perfect for organizations where multiple teams manage data sources

Example

Before this change:

# Database has: id column with comment "Primary key"
sources:
  - name: my_source
    tables:
      - name: my_table
        columns:
          - name: id
            # No description

Result: id column has no description in docs

After this change:

# Database has: id column with comment "Primary key"
sources:
  - name: my_source
    tables:
      - name: my_table
        columns:
          - name: id
            # No description

Result: id column shows "Primary key" from database comment in docs ✅

Use Cases

  1. BigQuery with Multiple Teams: Data engineering teams document columns at the database level, analytics teams automatically see those descriptions in dbt docs
  2. Gradual Documentation: Teams can start using dbt sources with database comments, then gradually add YAML descriptions where more context is needed
  3. External Data Sources: When working with external data sources, database comments from the source system are automatically available

Checklist

  • Implementation complete
  • Tests added and passing
  • No breaking changes
  • Works across all adapters supporting column comments
  • Documentation updated (IMPLEMENTATION_SUMMARY.md)

Related Issues

Closes #10476

Additional Notes

This implementation is adapter-agnostic and works with any database that supports column comments (PostgreSQL, BigQuery, Snowflake, Redshift, etc.). The feature only affects source columns, not model or seed columns.

@tripleaceme tripleaceme requested a review from a team as a code owner January 28, 2026 15:49
@cla-bot
Copy link

cla-bot bot commented Jan 28, 2026

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @tripleaceme

@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@github-actions github-actions bot added the community This PR is from a community member label Jan 28, 2026
… is unavailable

Implement automatic retrieval of column descriptions from the database for sources
during dbt docs generate, using them as fallback when YAML documentation is empty.

This enhancement reduces documentation maintenance burden by allowing teams to
leverage existing database column comments instead of duplicating them in YAML files.

Priority order:
1. YAML description (if present) - Takes priority
2. Database comment (if YAML empty) - Used as fallback

Implementation details:
- Add _enrich_source_columns_with_descriptions() method to GenerateTask
- Enrich source catalog entries after make_unique_id_map()
- Use dataclasses.replace() to update column metadata immutably
- No schema changes required (uses existing comment field)

Testing:
- Add comprehensive integration test in TestGenerateSourceColumnDescriptions
- Verify YAML descriptions take priority over DB comments
- Verify DB comments used as fallback when YAML is empty
- Verify DB comments used for undocumented columns

Benefits:
- Reduced maintenance for multi-team organizations
- Automatic documentation from database metadata
- No breaking changes to existing behavior
- Works across all adapters supporting column comments

Closes dbt-labs#10476
@tripleaceme tripleaceme force-pushed the feature/source-db-column-descriptions branch from b9a9159 to 30d13b3 Compare January 28, 2026 15:55
@cla-bot
Copy link

cla-bot bot commented Jan 28, 2026

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @tripleaceme

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Jan 31, 2026

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @tripleaceme

@tripleaceme tripleaceme force-pushed the feature/source-db-column-descriptions branch from 46dfeae to 96d1c48 Compare January 31, 2026 21:15
@cla-bot cla-bot bot added the cla:yes label Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla:yes community This PR is from a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] docs generate - retrieve column descriptions from DB when available - for Sources

1 participant