Skip to content

fix: Ensure streams with hardcoded schema attributes (e.g. REST and GraphQL streams) can have their schema overridden by the catalog #2940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Mar 31, 2025

Related

Summary by Sourcery

Modify stream schema handling to allow catalog-provided schema to override the default stream schema

Bug Fixes:

  • Ensure that record processing and replication key validation use the most up-to-date schema from the input catalog

Enhancements:

  • Introduce a private method _get_schema() to dynamically retrieve the stream's schema, prioritizing input catalog schema over the default schema

📚 Documentation preview 📚: https://meltano-sdk--2940.org.readthedocs.build/en/2940/

Summary by Sourcery

Improve stream schema handling to prioritize input catalog schema over default stream schema

Bug Fixes:

  • Ensure record processing and replication key validation use the most up-to-date schema from the input catalog

Enhancements:

  • Introduce an effective_schema property to dynamically retrieve the stream's schema, prioritizing input catalog schema
  • Add support for overriding the default stream schema with catalog-provided schema

Copy link
Contributor

sourcery-ai bot commented Mar 31, 2025

Reviewer's Guide by Sourcery

This pull request modifies the stream schema handling to ensure that the schema provided in the catalog overrides the default stream schema. This change ensures that record processing and replication key validation use the most up-to-date schema from the input catalog.

Sequence diagram for applying the catalog and using the effective schema

sequenceDiagram
  participant Tap
  participant Stream
  participant Catalog

  Tap->>Stream: apply_catalog(catalog)
  activate Stream
  Stream->>Catalog: catalog_entry.schema.to_dict()
  Catalog-->>Stream: schema_dict
  Stream->>Stream: _input_schema = schema_dict
  deactivate Stream

  Tap->>Stream: _generate_record_messages()
  activate Stream
  Stream->>Stream: schema = effective_schema
  Stream->>Stream: conform_record_data_types(record, schema)
  deactivate Stream
Loading

Updated class diagram for the Stream class

classDiagram
  class Stream {
    - _tap: Tap
    - _tap_state: dict
    - _tap_input_catalog: singer.Catalog | None
    - _input_schema: dict | None
    - _stream_maps: list[StreamMap] | None
    - forced_replication_method: str | None
    - _replication_key: str | None
    + effective_schema: dict
    + apply_catalog(catalog: singer.Catalog) : None
    + _generate_record_messages()
  }
  note for Stream._input_schema "New attribute to store the input schema from the catalog"
  note for Stream.effective_schema "Returns the input schema if available, otherwise returns the stream's schema"
  note for Stream.apply_catalog "Applies the catalog to the stream, including setting the input schema"
  note for Stream._generate_record_messages "Uses the effective schema for record processing"
Loading

File-Level Changes

Change Details Files
Introduces a private method to dynamically retrieve the stream's schema, prioritizing the input catalog schema over the default schema.
  • Added a property effective_schema that returns the input schema if it exists, otherwise returns the default schema.
  • Modified _generate_record_messages to use effective_schema instead of schema for data type conformance.
  • Modified is_timestamp_replication_key to use effective_schema instead of schema to determine the replication key type.
singer_sdk/streams/core.py
Implements catalog application to override the stream schema.
  • Added _input_schema attribute to the Stream class.
  • Modified apply_catalog to set the _input_schema attribute with the catalog schema.
singer_sdk/streams/core.py

Possibly linked issues

  • #0: The PR directly addresses the issue by implementing the schema overriding functionality described in the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

codecov bot commented Mar 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.50%. Comparing base (ca4ab5a) to head (9088d58).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2940      +/-   ##
==========================================
+ Coverage   91.49%   91.50%   +0.01%     
==========================================
  Files          63       63              
  Lines        5292     5300       +8     
  Branches      677      677              
==========================================
+ Hits         4842     4850       +8     
  Misses        318      318              
  Partials      132      132              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

codspeed-hq bot commented Mar 31, 2025

CodSpeed Performance Report

Merging #2940 will not alter performance

Comparing edgarrmondragon/fix/always-override-with-catalog-schema-private (9088d58) with main (ca4ab5a)

Summary

✅ 7 untouched benchmarks

@edgarrmondragon edgarrmondragon marked this pull request as ready for review March 31, 2025 16:03
@edgarrmondragon edgarrmondragon added this to the v0.46 milestone Mar 31, 2025
@edgarrmondragon edgarrmondragon self-assigned this Mar 31, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider renaming _input_schema to _catalog_schema for clarity.
  • It might be helpful to add a comment explaining why _get_schema is needed.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@edgarrmondragon
Copy link
Collaborator Author

@sourcery-ai review

@edgarrmondragon edgarrmondragon added the Type/Tap Singer taps label Apr 11, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider renaming effective_schema to get_effective_schema to clarify that it's a computed property.
  • It might be clearer to set self._input_schema to catalog_entry.schema.to_dict() in the __init__ method rather than in apply_catalog.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@edgarrmondragon edgarrmondragon changed the title fix: Ensure stream schema is overridden by the catalog with a private method fix: Ensure streams with hardcoded schema attributes (e.g. REST and GraphQL streams) can have their schema overridden by the catalog Apr 11, 2025
@edgarrmondragon edgarrmondragon merged commit 38fb347 into main Apr 11, 2025
36 checks passed
@edgarrmondragon edgarrmondragon deleted the edgarrmondragon/fix/always-override-with-catalog-schema-private branch April 11, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type/Tap Singer taps
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant