-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomly fails on writing parquet table to a external location #771
Comments
Recommend filing a ticket with Databricks to understand your options. I suspect there has been a Databricks Runtime change, and if you are using an AP Cluster, you can set this feature on the cluster, but if you are using a SQL Warehouse, might need to take a different approach. |
Is there a way I can wrap the drop table and create table in a transaction? |
Unfortunately no transaction support in Databricks at this time. I think the core issue though is that Databricks treats external locations as unmanaged, i.e. that if you have files in an external location, Databricks does not take the action to delete the files, and the failure is because it is unpredictable what will happen if writing to a location that already has files. If the files that exist there are just those written by this process though, that is another reason to file a Databricks ticket, to get an understanding of what has changed in the runtime to disrupt the execution of your workflow. |
@benc-db any reason this is just for delta? https://github.com/databricks/dbt-databricks/blob/main/dbt/include/databricks/macros/materializations/table.sql#L20 If it also did the same logic for parquet files then we could use the |
@sugendran that line exists because a delta table cannot 'create or replace' a non-delta table, so the old table needs to be dropped. With parquet in an external location, I'm not sure whether this will fix the issue. Have you tried it? My instinct is that since the storage is external (i.e. not managed) the table getting dropped will still leave the existing files, yielding the same issue as above. |
Describe the bug
We've started getting these errors in our DBT pipeline with no changes to the pipeline itself. It's been running for a while now without any problem. Not sure on how to debug this.
The actual table it fails on changes with each run.
Our config is:
System information
The output of
dbt --version
:from the query run in the sql datawarehouse
The text was updated successfully, but these errors were encountered: