Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add a new implementation of schema with ttl #5462

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

shekhar-rudder
Copy link
Member

@shekhar-rudder shekhar-rudder commented Jan 29, 2025

Description

This PR introduces schema invalidation based on a configurable TTL. When a schema entry expires, it is refreshed from the warehouse and updated in storage.

Changes:

  • Added an expires_at column to the wh_schemas table.
  • Implemented a new schema version (schema_v2), gated by a feature flag.
  • In schema_v2:
    1. Entries in the table are marked with an expiration timestamp (now + TTL) upon insert/update.
    2. Reads first validate the schema’s expiration status—if valid, it is returned; otherwise, it is re-fetched from the warehouse.
  • Updated database read/write methods to handle the new column.
  • Modified the NewSchema function to accept whManager, allowing schema_v2 to reference the repository.
  • Adjusted method signatures in the Handler interface to accept a context and return an error, as most schema_v2 methods now interact with the database.
  • Extracted removeDeprecatedColumns and tableSchemaDiff from schemaV1 for reuse in schemaV2.

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@shekhar-rudder shekhar-rudder force-pushed the war-177-schema-ttl branch 10 times, most recently from 6cc5479 to ef40a95 Compare January 31, 2025 08:54
@shekhar-rudder shekhar-rudder force-pushed the war-177-schema-ttl branch 3 times, most recently from 1110046 to f36f3e3 Compare January 31, 2025 16:37
Copy link

codecov bot commented Jan 31, 2025

Codecov Report

Attention: Patch coverage is 73.77049% with 48 lines in your changes missing coverage. Please review.

Project coverage is 75.10%. Comparing base (3db39a1) to head (a8081e0).

Files with missing lines Patch % Lines
warehouse/schema/schema_v2.go 74.80% 22 Missing and 10 partials ⚠️
warehouse/router/state_export_data.go 23.80% 11 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5462      +/-   ##
==========================================
+ Coverage   74.88%   75.10%   +0.21%     
==========================================
  Files         458      459       +1     
  Lines       63269    63424     +155     
==========================================
+ Hits        47381    47632     +251     
+ Misses      13235    13146      -89     
+ Partials     2653     2646       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shekhar-rudder shekhar-rudder force-pushed the war-177-schema-ttl branch 2 times, most recently from 5cb6d75 to a8081e0 Compare February 3, 2025 17:49
@RanjeetMishra RanjeetMishra requested review from vyeshwanth and removed request for RanjeetMishra February 5, 2025 06:30
warehouse/internal/repo/schema.go Outdated Show resolved Hide resolved
warehouse/router/state_export_data.go Outdated Show resolved Hide resolved
warehouse/schema/schema_v2.go Outdated Show resolved Hide resolved
warehouse/schema/schema_v2.go Outdated Show resolved Hide resolved
Comment on lines 19 to 34
type schemaV2 struct {
stats struct {
schemaSize stats.Histogram
}
warehouse model.Warehouse
log logger.Logger
ttlInMinutes time.Duration
schemaRepo schemaRepo
stagingFilesSchemaPaginationSize int
stagingFileRepo stagingFileRepo
enableIDResolution bool
fetchSchemaRepo fetchSchemaRepo
now func() time.Time
}
Copy link
Member

@achettyiitr achettyiitr Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all API calls are we excessively calling the database using sh.getSchema(ctx) even for things like IsWarehouseSchemaEmpty or GetTableSchemaInWarehouse or GetColumnsCountInWarehouseSchema? This will make too many DB calls.

Can't we store it within schemaV2 once and use it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 2 new fields in the struct cachedSchema and cacheExpiry to avoid DB calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants