Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly fails on writing parquet table to a external location #771

Open
sugendran opened this issue Aug 16, 2024 · 5 comments
Open

Randomly fails on writing parquet table to a external location #771

sugendran opened this issue Aug 16, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@sugendran
Copy link

sugendran commented Aug 16, 2024

Describe the bug

We've started getting these errors in our DBT pipeline with no changes to the pipeline itself. It's been running for a while now without any problem. Not sure on how to debug this.

01:48:16    Runtime Error in model dim_suppliers (models/marts/core/dim_suppliers.sql)
  CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty directory s3://ordermentum-data/publish/production/core/dim_suppliers. To allow overwriting the existing non-empty directory, set 'spark.sql.legacy.allowNonEmptyLocationInCTAS' to true.

The actual table it fails on changes with each run.

Our config is:

+materialized: table
+file_format: parquet
+location_root: "{{ env_var('publishLocation', 's3://ordermentum-data/publish/dev') ~ '/core' }}"

System information

The output of dbt --version:

from the query run in the sql datawarehouse

"app": "dbt", "dbt_version": "1.8.5", "dbt_databricks_version": "1.8.5", "databricks_sql_connector_version": "3.1.2",
@sugendran sugendran added the bug Something isn't working label Aug 16, 2024
@benc-db
Copy link
Collaborator

benc-db commented Aug 16, 2024

Recommend filing a ticket with Databricks to understand your options. I suspect there has been a Databricks Runtime change, and if you are using an AP Cluster, you can set this feature on the cluster, but if you are using a SQL Warehouse, might need to take a different approach.

@sugendran
Copy link
Author

Is there a way I can wrap the drop table and create table in a transaction?

@benc-db
Copy link
Collaborator

benc-db commented Aug 20, 2024

Unfortunately no transaction support in Databricks at this time. I think the core issue though is that Databricks treats external locations as unmanaged, i.e. that if you have files in an external location, Databricks does not take the action to delete the files, and the failure is because it is unpredictable what will happen if writing to a location that already has files. If the files that exist there are just those written by this process though, that is another reason to file a Databricks ticket, to get an understanding of what has changed in the runtime to disrupt the execution of your workflow.

@sugendran
Copy link
Author

@benc-db any reason this is just for delta? https://github.com/databricks/dbt-databricks/blob/main/dbt/include/databricks/macros/materializations/table.sql#L20

If it also did the same logic for parquet files then we could use the CREATE OR REPLACE to create the table

@benc-db
Copy link
Collaborator

benc-db commented Sep 12, 2024

@sugendran that line exists because a delta table cannot 'create or replace' a non-delta table, so the old table needs to be dropped. With parquet in an external location, I'm not sure whether this will fix the issue. Have you tried it? My instinct is that since the storage is external (i.e. not managed) the table getting dropped will still leave the existing files, yielding the same issue as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants