-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat: Use database column comments for source documentation when YAML is unavailable #12399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Use database column comments for source documentation when YAML is unavailable #12399
Conversation
|
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @tripleaceme |
|
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
… is unavailable Implement automatic retrieval of column descriptions from the database for sources during dbt docs generate, using them as fallback when YAML documentation is empty. This enhancement reduces documentation maintenance burden by allowing teams to leverage existing database column comments instead of duplicating them in YAML files. Priority order: 1. YAML description (if present) - Takes priority 2. Database comment (if YAML empty) - Used as fallback Implementation details: - Add _enrich_source_columns_with_descriptions() method to GenerateTask - Enrich source catalog entries after make_unique_id_map() - Use dataclasses.replace() to update column metadata immutably - No schema changes required (uses existing comment field) Testing: - Add comprehensive integration test in TestGenerateSourceColumnDescriptions - Verify YAML descriptions take priority over DB comments - Verify DB comments used as fallback when YAML is empty - Verify DB comments used for undocumented columns Benefits: - Reduced maintenance for multi-team organizations - Automatic documentation from database metadata - No breaking changes to existing behavior - Works across all adapters supporting column comments Closes dbt-labs#10476
b9a9159 to
30d13b3
Compare
|
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @tripleaceme |
1 similar comment
|
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @tripleaceme |
46dfeae to
96d1c48
Compare
Description
This PR implements the feature requested in issue #10476 to automatically retrieve column descriptions from the database for sources during
dbt docs generate, using them as fallback when YAML documentation is unavailable.Problem
Previously, dbt required manual maintenance of column descriptions in YAML files for source tables. If database columns had comments/descriptions but weren't documented in YAML, those database comments would not appear in the generated documentation. This created unnecessary duplication of effort, especially in organizations where multiple teams manage different data sources.
Solution
This enhancement allows
dbt docs generateto automatically use database column comments as fallback descriptions when YAML documentation is empty for source columns.Priority Order:
Changes
Core Implementation
core/dbt/task/docs/generate.py_enrich_source_columns_with_descriptions()method toGenerateTaskmake_unique_id_map()dataclasses.replace()to update column metadata immutablycommentfield)Testing
tests/functional/docs/test_generate.pyTestGenerateSourceColumnDescriptionsBenefits
Example
Before this change:
Result:
idcolumn has no description in docsAfter this change:
Result:
idcolumn shows "Primary key" from database comment in docs ✅Use Cases
Checklist
Related Issues
Closes #10476
Additional Notes
This implementation is adapter-agnostic and works with any database that supports column comments (PostgreSQL, BigQuery, Snowflake, Redshift, etc.). The feature only affects source columns, not model or seed columns.