Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for UC volumes in DABs #1762

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Sep 9, 2024

Changes

This PR adds the UC volumes resource to DABs. This PR also adds a warning if a user tries to use a volume in the artifact_path that is managed by DABs and has not been deployed yet. Users would need to do two incremental deploys in this case because we always upload artifacts before creating all the resources.

A proper fix where we can allow using the volume within the same deploy would require us to head in a direction of removing our dependency on terraform to manage the CRUD lifecycle of DABs resources.

This PR also has custom behaviour for UC volumes, where presents.name_prefix or the default prefix is only applied on a volume when the catalog or schema is not already scoped to the user.

Tests

Unit, integration test and manually.

Manual outputs:

  1. UC volume does not exist:
➜  bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found
  1. UC Volume does not exist, but is defined in DAB
➜  bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found

Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path.
  at resources.volumes.bar
  in databricks.yml:24:7

@shreyas-goenka shreyas-goenka marked this pull request as ready for review September 16, 2024 02:10

// TODO: Are there fields in the edit API or terraform that are not in this struct?
// If so call it out in the PR.
*catalog.CreateVolumeRequestContent
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably OWNER is missing from the schema here. Terraform allows managing the owner, but it's not available in the create API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do owner assignment elsewhere, can you clarify the significance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real significance for DABs. I just wanted to call this out since it's a divergence from Terraform. In fact, this is nice because it clearly guarantees that the owner of the UC volume is the deploying user. We do not allow modifying the owner of any of our UC resources.


// We don't need to display any prompts in this case.
if len(dltActions) == 0 && len(schemaActions) == 0 {
if len(schemaActions) == 0 && len(dltActions) == 0 && len(volumeActions) == 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces yet again interactive flow to DABs. At this point, I think it's worth it to invest some time in adding proper end to end tests for this. I'd like to pick it up in the next ~month.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pick this next quarter as in scope for terminal UX improvements.

@shreyas-goenka
Copy link
Contributor Author

Nightlies passed once. Triggered another round after some minor updates.

assert.False(t, ok,
`Since you are adding support for UC catalogs to DABs please ensure that
you add logic to skip applying presets.name_prefix for UC schemas, UC volumes and
any other resources that fall under a catalog in the UC hierarchy (like registered models).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to retroactively change the prefix behaviour for registered models as well to keep it consistent? No one has complained yet so it's probably fine to keep as is.


// We don't need to display any prompts in this case.
if len(dltActions) == 0 && len(schemaActions) == 0 {
if len(schemaActions) == 0 && len(dltActions) == 0 && len(volumeActions) == 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pick this next quarter as in scope for terminal UX improvements.

@@ -193,6 +195,30 @@ func (m *applyPresets) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnos
// the Databricks UI and via the SQL API.
}

// Apply the prefix to volumes
for _, v := range r.Volumes {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually verified that this indeed works end to end.


// TODO: Are there fields in the edit API or terraform that are not in this struct?
// If so call it out in the PR.
*catalog.CreateVolumeRequestContent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do owner assignment elsewhere, can you clarify the significance?

bundle/libraries/filer.go Outdated Show resolved Hide resolved
bundle/phases/deploy.go Outdated Show resolved Hide resolved
},
}

diags := bundle.Apply(ctx, b, bundle.Seq(libraries.ExpandGlobReferences(), libraries.Upload()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which mutator triggers the error?

internal/bundle/artifacts_test.go Show resolved Hide resolved
internal/bundle/artifacts_test.go Outdated Show resolved Hide resolved
bundle/config/mutator/apply_presets.go Outdated Show resolved Hide resolved
log.Debugf(ctx, "Skipping prefix for volume %s because schema %s is defined in the bundle and the schema name will be interpolated at runtime", v.Name, v.SchemaName)
continue

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels magical and has the risk of the same issue we have with schemas today.

To avoid we could make prefixing opt-in through a preset, and recommend setting it in development mode unless there is signal that prefixing is already done. Then it will be a conscious decision at all times. We can discuss more but this should be in line with what we do for schemas (and later also the registered models).

Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 1762
  • Commit SHA: 1218178e64844a091fb06293b6fc14552273942b

Checks will be approved automatically on success.

@eng-dev-ecosystem-bot
Copy link
Collaborator

Test Details: go/deco-tests/11616451005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants