-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(MLOP-2604): Create delta writer #410
Conversation
@@ -24,82 +25,78 @@ def _get_full_table_name(table, database): | |||
@staticmethod | |||
def _convert_to_delta(client: SparkClient, table: str): | |||
logger.info(f"Converting {table} to Delta...") | |||
client.conn.sql(f"CONVERT TO DELTA {table}") | |||
client.conn.sql( | |||
f"""ALTER TABLE {table} SET TBLPROPERTIES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: can you clarify on the need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was testing when the table is Delta already, I need to change this, thanks for reminding me.
DeltaWriter()._convert_to_delta(client, full_table_name) | ||
|
||
# For schema evolution | ||
client.conn.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: would there be a case in which this wouldn't be the desired behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there could be cases where manual schema control is preferred, such as when strict data governance is required. However, leaving schema evolution enabled helps avoid unnecessary pipeline failures when new features or columns are added. It keeps things smooth and flexible, so users don’t have to manually update schemas every time.
|
.select("format") | ||
.collect()[0][0] | ||
) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what exceptions we should expect here?
🤖 I have created a release *beep* *boop* --- ## [1.6.0](1.5.0...1.6.0) (2025-02-28) ### Features * **MLOP-2604:** Create delta writer ([#410](#410)) ([98c0878](98c0878)) ### Bug Fixes * testing ([cc1ad97](cc1ad97)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ralph Rassweiler <[email protected]>
Why? 📖
Currently, Delta operations are not treated as a standard writer in Butterfree, making it difficult to handle Delta tables in the same way as other feature store writers. This PR introduces a Delta Feature Store Writer, enabling seamless integration of Delta operations into the existing pipeline.
What? 🔧
This PR introduces:
VACUUM
andOPTIMIZE
) for Delta tables.Type of Change
How was everything tested? 📏