Skip to content

fix: Generate standard stream metadata for nested fields #2924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 11, 2025

Conversation

edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Mar 26, 2025

FWIW, this nested metadata is already added by Meltano: https://github.com/meltano/meltano/blob/7cf6f2a0833127258dc885f4c095a028eab6ac06/src/meltano/core/plugin/singer/catalog.py#L415-L442

Related

Links

Summary by Sourcery

Generates standard stream metadata for nested fields, ensuring that all subfields within a schema have appropriate metadata entries.

Enhancements:

  • Adds metadata entries for all subfields within a schema, ensuring that nested fields are properly represented in the metadata.
  • Improves the accuracy and completeness of metadata generated for complex schemas with nested properties.
  • Adds a recursive helper function to add breadcrumbs and metadata for subfields.

Summary by Sourcery

Enhance metadata generation for Singer streams to support nested field metadata, ensuring comprehensive metadata coverage for complex schemas with nested properties.

Enhancements:

  • Implement a recursive metadata generation approach to add metadata entries for nested fields in complex schemas
  • Extend metadata generation to support nested object and array schemas with comprehensive breadcrumb tracking

Tests:

  • Update test cases to validate metadata generation for nested schemas, including objects with multiple levels of nesting and array types

Copy link
Contributor

sourcery-ai bot commented Mar 26, 2025

Reviewer's Guide by Sourcery

This pull request enhances metadata generation for nested fields in the Singer SDK. It implements recursive metadata generation to provide comprehensive metadata for complex schemas with nested properties. The changes include a new recursive helper function and updated test cases to validate the new functionality.

Sequence diagram for metadata generation with nested fields

sequenceDiagram
  participant Catalog
  participant Schema
  participant MetadataMapping
  participant Metadata

  Catalog->>Schema: get_standard_metadata(schema)
  Schema->>Schema: schema.get("properties")
  loop For each field in schema properties
    Schema->>MetadataMapping: Create Metadata entry
    Schema->>Schema: _add_subfield_metadata(field_schema["properties"])
    Schema->>Schema: field_schema.get("properties")
    alt field_schema has properties
      Schema->>Schema: _add_subfield_metadata(field_schema["properties"])
      Schema->>MetadataMapping: Create Metadata entry for subfield
    else field_schema has no properties
      Schema-->>Catalog: Return
    end
  end
  Schema-->>Catalog: Return MetadataMapping
Loading

Updated class diagram for Metadata generation

classDiagram
  class Metadata {
    +InclusionType inclusion
    +string schema_name
  }
  class Catalog {
    +get_standard_metadata(schema: dict, key_properties: list, valid_replication_keys: list, selected_by_default: bool) MetadataMapping
    -_add_subfield_metadata(properties: dict, breadcrumb: Breadcrumb) None
  }
  class MetadataMapping {
    +dict mapping
  }
  Catalog -- MetadataMapping : creates
  Catalog -- Metadata : creates
Loading

File-Level Changes

Change Details Files
Implemented recursive metadata generation for nested schema fields.
  • Added a recursive helper function _add_subfield_metadata to add breadcrumbs and metadata for subfields.
  • Modified get_standard_metadata to call _add_subfield_metadata for nested properties.
  • Added metadata entries for all subfields within a schema.
singer_sdk/singerlib/catalog.py
Added test cases to validate metadata generation for nested schemas with multiple levels of object properties.
  • Added new test cases with nested schemas to test_standard_metadata.
  • Added breadcrumbs as a parameter to test_standard_metadata to validate the generated metadata.
tests/singerlib/test_catalog.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@edgarrmondragon edgarrmondragon self-assigned this Mar 26, 2025
@edgarrmondragon edgarrmondragon added the Type/Tap Singer taps label Mar 26, 2025
@edgarrmondragon edgarrmondragon added this to the v0.45 milestone Mar 26, 2025
Copy link

codecov bot commented Mar 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.52%. Comparing base (38fb347) to head (08128ab).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2924      +/-   ##
==========================================
+ Coverage   91.50%   91.52%   +0.01%     
==========================================
  Files          63       63              
  Lines        5300     5308       +8     
  Branches      677      679       +2     
==========================================
+ Hits         4850     4858       +8     
  Misses        318      318              
  Partials      132      132              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

codspeed-hq bot commented Mar 26, 2025

CodSpeed Performance Report

Merging #2924 will not alter performance

Comparing edgarrmondragon/fix/nested-metadata (08128ab) with main (38fb347)

Summary

✅ 7 untouched benchmarks

@edgarrmondragon edgarrmondragon requested a review from Copilot March 26, 2025 00:38
@edgarrmondragon edgarrmondragon marked this pull request as ready for review March 26, 2025 00:38
@edgarrmondragon
Copy link
Collaborator Author

@sourcery-ai review

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the generation of standard stream metadata for nested fields by introducing breadcrumbs to track nested properties. It updates both the catalog metadata generation logic in singer_sdk and the associated tests to validate the new breadcrumb behavior.

  • Updated parameterized tests to include expected breadcrumb sets.
  • Modified get_standard_metadata in catalog.py to use tuple-based breadcrumbs for nested field metadata.
  • Adjusted test assertions to compare the generated metadata keys with the provided breadcrumb set.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/singerlib/test_catalog.py Added breadcrumbs parameters in tests and updated assertions.
singer_sdk/singerlib/catalog.py Refactored metadata generation to include nested field breadcrumbs.
Comments suppressed due to low confidence (2)

singer_sdk/singerlib/catalog.py:216

  • [nitpick] Consider renaming 'breadcrumb' to a more descriptive name like 'path' or 'current_path' to clearly indicate its role in representing the metadata hierarchy.
breadcrumb = ("properties", field_name)

tests/singerlib/test_catalog.py:300

  • [nitpick] Consider adding a more specific type annotation for the 'breadcrumbs' parameter (e.g., Set[Tuple[str, ...]]) to improve clarity and enforce type safety in tests.
def test_standard_metadata(..., breadcrumbs: set):

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a comment to the _add_subfield_metadata function explaining its purpose.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a comment to the _add_subfield_metadata function explaining its purpose.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

edgarrmondragon and others added 2 commits March 25, 2025 18:45
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@edgarrmondragon
Copy link
Collaborator Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a helper function to reduce the duplication in the test case parameters.
  • The recursive function _add_subfield_metadata could be a standalone function instead of a nested function.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@edgarrmondragon edgarrmondragon moved this to Up Next in Office Hours Mar 26, 2025
@edgarrmondragon edgarrmondragon moved this from Up Next to To Discuss in Office Hours Mar 26, 2025
@edgarrmondragon
Copy link
Collaborator Author

FWIW, this nested metadata is already added by Meltano: https://github.com/meltano/meltano/blob/7cf6f2a0833127258dc885f4c095a028eab6ac06/src/meltano/core/plugin/singer/catalog.py#L415-L442

Even then, it'd be nice not rely on Meltano to enrich their metadata.

@edgarrmondragon
Copy link
Collaborator Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a more specific type hint for the properties parameter in _add_subfield_metadata.
  • The test case with nested properties is a great addition.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 1 issue found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@edgarrmondragon
Copy link
Collaborator Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a helper function to generate the expected breadcrumbs in the test to improve readability.
  • The _add_subfield_metadata function could be a static method on the MetadataMapping class.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@edgarrmondragon edgarrmondragon merged commit c094d90 into main Apr 11, 2025
36 checks passed
@edgarrmondragon edgarrmondragon deleted the edgarrmondragon/fix/nested-metadata branch April 11, 2025 19:23
@github-project-automation github-project-automation bot moved this from To Discuss to Up Next in Office Hours Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type/Tap Singer taps
Projects
Status: Up Next
Development

Successfully merging this pull request may close these issues.

1 participant