Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch references to UC schemas to capture dependencies automatically #1989

Merged
merged 16 commits into from
Jan 16, 2025

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Dec 9, 2024

Changes

Fixes #1977.

This PR modifies the bundle configuration to capture the dependency that a UC Volume or a DLT pipeline might have on a UC schema at deployment time. It does so by replacing the schema name with a reference of the form ${resources.schemas.foo.name}.

For example:
The following UC Volume definition depends on the UC schema with the name schema_name. This mutator converts this configuration

from:

resources:
  volumes:
    bar:
      catalog_name: catalog_name
      name: volume_name
      schema_name: schema_name

  schemas:
    foo:
      catalog_name: catalog_name
      name: schema_name

to:

resources:
  volumes:
    bar:
      catalog_name: catalog_name
      name: volume_name
      schema_name: ${resources.schemas.foo.name}`

  schemas:
    foo:
      catalog_name: catalog_name
      name: schema_name

Tests

Unit tests and manually.

@eng-dev-ecosystem-bot
Copy link
Collaborator

Test Details: go/deco-tests/12239986056

@shreyas-goenka shreyas-goenka changed the title Warn user to use ${resources.schemas...} syntax Warn user to use ${resources.schemas...} syntax Dec 9, 2024
bundle/config/validate/schema_references.go Outdated Show resolved Hide resolved
diags := diag.Diagnostics{}
for k, p := range rb.Config().Resources.Pipelines {
// Skip if the pipeline uses hive metastore.
if p.Catalog == "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API allows creating a pipeline without a schema/target specified as long as you are not using UC. It's not a forward-looking use case anyway, so I did not look deeper into this.

Severity: diag.Warning,
Summary: `Use ${resources.schemas.s1.name} syntax to refer to the UC schema instead of directly using its name "schema1"`,
Detail: `Using ${resources.schemas.s1.name} will allow DABs to capture the deploy time dependency this DLT pipeline
has on the schema "schema1" and deploy changes to the schema before deploying the pipeline.`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be painful to update this test if the message changes, but it's not nice to assert on the message.

I have added a "golden files" test utilities here #2025 , it's a good fit for this cases.

Full output is stored in the file and can be updated by running tests with TESTS_OUTPUT=OVERWRITE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, that PR only targets integration tests for now, so not ready for your use case yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine in this case since its unit tests that are colocated with the mutator that implements the functionality. It'll be a simple search and replace when updating.

Are you proposing to extend the golden file approach to mutators or unit tests? Possibly by serializing the diagnostics and persisting them in a file?

@shreyas-goenka shreyas-goenka requested a review from denik December 20, 2024 15:44
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we can ask users to capture this dependency through variable interpolation, but could we do this ourselves instead? I.e. we know if someone wants to use a schema for a volume, so we can inject a "depends on" relationship transparently.

bundle/config/validate/schema_references.go Outdated Show resolved Hide resolved
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 1989
  • Commit SHA: aa52b1dab3354c56886cae7740b2cb73fb5ade69

Checks will be approved automatically on success.

@shreyas-goenka shreyas-goenka added this pull request to the merge queue Jan 16, 2025
Merged via the queue into main with commit f2bba63 Jan 16, 2025
9 checks passed
@shreyas-goenka shreyas-goenka deleted the detect/schema-dep branch January 16, 2025 13:34
andrewnester added a commit that referenced this pull request Jan 16, 2025
New feature announcement.

You can now manage Databricks Apps using DABs by defining an `app` resource in your bundle configuration.
For more information see Databricks documentation https://docs.databricks.com/en/dev-tools/bundles/resources.html#app

CLI:
 * Filter out system clusters in cluster picker ([#2131](#2131)).
 * Process all the fields in top level request object even if it contains request body ([#2155](#2155)).

Bundles:
 * Added support for Databricks Apps in DABs ([#1928](#1928)).
 * Allow artifact path to be located outside the sync root ([#2128](#2128)).
 * Retry app deployment if there is an active deployment in progress ([#2153](#2153)).
 * Resolve variables in a loop ([#2164](#2164)).
 * Improve resolution of complex variables within complex variables ([#2157](#2157)).
 * Added output message to warn about slower deployments with apps ([#2161](#2161)).
 * Patch references to UC schemas to capture dependencies automatically ([#1989](#1989)).
 * Format default-python template ([#2110](#2110)).
 * Encourage the use of root_path in production to ensure single deployment ([#1712](#1712)).
 * Log warnings to stderr for "bundle validate -o json" ([#2109](#2109)).

Internal:
 * Move merge fix-ups after variable resolution ([#2125](#2125)).
 * Enable linter 'unconvert' and fix the issues found ([#2136](#2136)).
 * Coverage for acceptance tests ([#2123](#2123)).
 * Add acceptance tests for builtin templates ([#2135](#2135)).
 * Add a unique schema for recreate pipeline test ([#2159](#2159)).
 * Migrate resolution tests to acceptance tests ([#2143](#2143)).
 * Update runner for the publish-winget job ([#2105](#2105)).
 * Add a test for complex variable resolution with 3 levels ([#2163](#2163)).

API Changes:
 * Changed `databricks account federation-policy update` command with new required argument order.
 * Changed `databricks account service-principal-federation-policy update` command with new required argument order.

OpenAPI commit 779817ed8d63031f5ea761fbd25ee84f38feec0d (2025-01-08)
Dependency updates:
 * Upgrade TF provider to 1.63.0 ([#2162](#2162)).
 * Bump golangci-lint version to v1.63.4 from v1.63.1 ([#2114](#2114)).
 * Bump astral-sh/setup-uv from 4 to 5 ([#2116](#2116)).
 * Bump golang.org/x/oauth2 from 0.24.0 to 0.25.0 ([#2080](#2080)).
 * Bump github.com/hashicorp/hc-install from 0.9.0 to 0.9.1 ([#2079](#2079)).
 * Bump golang.org/x/term from 0.27.0 to 0.28.0 ([#2078](#2078)).
 * Bump github.com/databricks/databricks-sdk-go from 0.54.0 to 0.55.0 ([#2126](#2126)).
github-merge-queue bot pushed a commit that referenced this pull request Jan 16, 2025
### New feature announcement

#### Databricks Apps support

You can now manage Databricks Apps using DABs by defining an `app`
resource in your bundle configuration.
For more information see Databricks documentation
https://docs.databricks.com/en/dev-tools/bundles/resources.html#app

#### Referencing complex variables in complex variables

You can now reference complex variables within other complex variables.
For more details see #2157

CLI:
* Filter out system clusters in cluster picker
([#2131](#2131)).
* Add command line flags for fields that are not in the API request body
([#2155](#2155)).

Bundles:
* Added support for Databricks Apps in DABs
([#1928](#1928)).
* Allow artifact path to be located outside the sync root
([#2128](#2128)).
* Retry app deployment if there is an active deployment in progress
([#2153](#2153)).
* Resolve variables in a loop
([#2164](#2164)).
* Improve resolution of complex variables within complex variables
([#2157](#2157)).
* Added output message to warn about slower deployments with apps
([#2161](#2161)).
* Patch references to UC schemas to capture dependencies automatically
([#1989](#1989)).
* Format default-python template
([#2110](#2110)).
* Encourage the use of root_path in production to ensure single
deployment ([#1712](#1712)).
* Log warnings to stderr for "bundle validate -o json"
([#2109](#2109)).

API Changes:
* Changed `databricks account federation-policy update` command with new
required argument order.
* Changed `databricks account service-principal-federation-policy
update` command with new required argument order.

OpenAPI commit 779817ed8d63031f5ea761fbd25ee84f38feec0d (2025-01-08)
Dependency updates:
* Upgrade TF provider to 1.63.0
([#2162](#2162)).
* Bump golangci-lint version to v1.63.4 from v1.63.1
([#2114](#2114)).
* Bump astral-sh/setup-uv from 4 to 5
([#2116](#2116)).
* Bump golang.org/x/oauth2 from 0.24.0 to 0.25.0
([#2080](#2080)).
* Bump github.com/hashicorp/hc-install from 0.9.0 to 0.9.1
([#2079](#2079)).
* Bump golang.org/x/term from 0.27.0 to 0.28.0
([#2078](#2078)).
* Bump github.com/databricks/databricks-sdk-go from 0.54.0 to 0.55.0
([#2126](#2126)).

---------

Co-authored-by: shreyas-goenka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

setting up dependency is not working between schema and volume
4 participants