-
Notifications
You must be signed in to change notification settings - Fork 75
fix: Generate standard stream metadata for nested fields #2924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's Guide by SourceryThis pull request enhances metadata generation for nested fields in the Singer SDK. It implements recursive metadata generation to provide comprehensive metadata for complex schemas with nested properties. The changes include a new recursive helper function and updated test cases to validate the new functionality. Sequence diagram for metadata generation with nested fieldssequenceDiagram
participant Catalog
participant Schema
participant MetadataMapping
participant Metadata
Catalog->>Schema: get_standard_metadata(schema)
Schema->>Schema: schema.get("properties")
loop For each field in schema properties
Schema->>MetadataMapping: Create Metadata entry
Schema->>Schema: _add_subfield_metadata(field_schema["properties"])
Schema->>Schema: field_schema.get("properties")
alt field_schema has properties
Schema->>Schema: _add_subfield_metadata(field_schema["properties"])
Schema->>MetadataMapping: Create Metadata entry for subfield
else field_schema has no properties
Schema-->>Catalog: Return
end
end
Schema-->>Catalog: Return MetadataMapping
Updated class diagram for Metadata generationclassDiagram
class Metadata {
+InclusionType inclusion
+string schema_name
}
class Catalog {
+get_standard_metadata(schema: dict, key_properties: list, valid_replication_keys: list, selected_by_default: bool) MetadataMapping
-_add_subfield_metadata(properties: dict, breadcrumb: Breadcrumb) None
}
class MetadataMapping {
+dict mapping
}
Catalog -- MetadataMapping : creates
Catalog -- Metadata : creates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2924 +/- ##
==========================================
+ Coverage 91.50% 91.52% +0.01%
==========================================
Files 63 63
Lines 5300 5308 +8
Branches 677 679 +2
==========================================
+ Hits 4850 4858 +8
Misses 318 318
Partials 132 132 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CodSpeed Performance ReportMerging #2924 will not alter performanceComparing Summary
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the generation of standard stream metadata for nested fields by introducing breadcrumbs to track nested properties. It updates both the catalog metadata generation logic in singer_sdk and the associated tests to validate the new breadcrumb behavior.
- Updated parameterized tests to include expected breadcrumb sets.
- Modified get_standard_metadata in catalog.py to use tuple-based breadcrumbs for nested field metadata.
- Adjusted test assertions to compare the generated metadata keys with the provided breadcrumb set.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
tests/singerlib/test_catalog.py | Added breadcrumbs parameters in tests and updated assertions. |
singer_sdk/singerlib/catalog.py | Refactored metadata generation to include nested field breadcrumbs. |
Comments suppressed due to low confidence (2)
singer_sdk/singerlib/catalog.py:216
- [nitpick] Consider renaming 'breadcrumb' to a more descriptive name like 'path' or 'current_path' to clearly indicate its role in representing the metadata hierarchy.
breadcrumb = ("properties", field_name)
tests/singerlib/test_catalog.py:300
- [nitpick] Consider adding a more specific type annotation for the 'breadcrumbs' parameter (e.g., Set[Tuple[str, ...]]) to improve clarity and enforce type safety in tests.
def test_standard_metadata(..., breadcrumbs: set):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a comment to the
_add_subfield_metadata
function explaining its purpose.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a comment to the
_add_subfield_metadata
function explaining its purpose.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a helper function to reduce the duplication in the test case parameters.
- The recursive function
_add_subfield_metadata
could be a standalone function instead of a nested function.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Even then, it'd be nice not rely on Meltano to enrich their metadata. |
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a more specific type hint for the
properties
parameter in_add_subfield_metadata
. - The test case with nested properties is a great addition.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 1 issue found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a helper function to generate the expected breadcrumbs in the test to improve readability.
- The
_add_subfield_metadata
function could be a static method on theMetadataMapping
class.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
FWIW, this nested metadata is already added by Meltano: https://github.com/meltano/meltano/blob/7cf6f2a0833127258dc885f4c095a028eab6ac06/src/meltano/core/plugin/singer/catalog.py#L415-L442
Related
my_stream.prop.*
should imply selecting the top-level propertymy_stream.prop
meltano#8807Links
Summary by Sourcery
Generates standard stream metadata for nested fields, ensuring that all subfields within a schema have appropriate metadata entries.
Enhancements:
Summary by Sourcery
Enhance metadata generation for Singer streams to support nested field metadata, ensuring comprehensive metadata coverage for complex schemas with nested properties.
Enhancements:
Tests: