Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating liquid clustering on incremental runs breaks concurrency for incremental tables #826

Open
mmansikka opened this issue Oct 14, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@mmansikka
Copy link

Describe the bug

Since the additions of

We no break concurrency because of alter table statements running at the same time as another process is writing to the same table

Steps To Reproduce

Run several dbt processes at the same time for example when using different source systems
dbt run -s common_table --vars '{"source_systems": ["SOURCE_1"]}'
dbt run -s common_table --vars '{"source_systems": ["SOURCE_2"]}'
dbt run -s common_table --vars '{"source_systems": ["SOURCE_3]}'

Expected behavior

By default do not run liquid clustering updates or column updates when incremental is run. This behavior should be controlled perhaps with a config parameter and if it is empty (default) then do not update column descriptions or liquid clustering columns.
As a quick and dirty fix we added if not is_incremental() checks to incremental materialization here

    {% if tblproperties is not none and not is_incremental() %} {# override: add incremental check, to not break concurrency #}
        {% do apply_tblproperties(target_relation, tblproperties.tblproperties) %}
      {%- endif -%}
    {%- endif -%}
    {% if not is_incremental() %} {# override: add incremental check, to not break concurrency #}
        {% do persist_docs(target_relation, model, for_relation=True) %}
    {%- endif -%}

Screenshots and log output

image

System information

The output of dbt --version:

Core:
  - installed: 1.8.6
  - latest:    1.8.7 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.8.6 - Update available!
  - spark:      1.8.0 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Additional context

Add any other context about the problem here.

@mmansikka mmansikka added the bug Something isn't working label Oct 14, 2024
@benc-db
Copy link
Collaborator

benc-db commented Oct 15, 2024

Is this concurrency within a single run, or are you talking about running multiple instances of dbt targetting the same table? The latter I don't think has ever been intentionally supported.

@mmansikka
Copy link
Author

mmansikka commented Oct 16, 2024

When running multiple instances of dbt targetting the same table. There has been quite a lot of work by databricks to support concurrent writes. It would be a shame if this is not supported by default, or there is no way to remove these concurrency breaking processes. I have discovered that the following break concurrency:

  • Optimize. If DATABRICKS_SKIP_OPTIMIZE is not set to true. See issue
  • updating liquid clustering (see above). Also this issue is related
  • updating docs generation (see above). Seems like documents are also updating (table comment) every run even though there are no changes.

If we would allow similar variables for docs generation and liquid clustering, we would be able to support optimized scheduled runs with multiple instances of dbt targetting the same table. See this issue

On config level this would even better than as vars because on larger project there is a need for granular settings and otherwise you would need to run multiple dbt runs. Also see discussion in the above issue.

@mmansikka mmansikka reopened this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants