Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not ADD ON CLUSTER when create __dbt_backup #205

Open
ikeniborn opened this issue Nov 8, 2023 · 5 comments · May be fixed by #206
Open

Not ADD ON CLUSTER when create __dbt_backup #205

ikeniborn opened this issue Nov 8, 2023 · 5 comments · May be fixed by #206
Labels
bug Something isn't working

Comments

@ikeniborn
Copy link

Describe the bug

Steps to reproduce

clickhouse:
target: default
outputs:
default:
driver: native
type: clickhouse
schema: default
user: "{{ env_var('DBT_ENV_SECRET_USER') }}"
password: "{{ env_var('DBT_ENV_SECRET_PASSWORD') }}"
#optional fields
port: 9000
host: "{{ env_var('DBT_ENV_SECRET_HOST') }}"
verify: False
secure: False
connect_timeout: 60
# compression: 'gzip'
threads: 8
send_receive_timeout: 100000
check_exchange: False
cluster: "{cluster}"
cluster_mode: False
# Native (clickhouse-driver) connection settings
sync_request_timeout: 5
compress_block_size: 1048576
use_lw_deletes: True
custom_settings:
enable_optimize_predicate_expression: 1
max_block_size: 65536
max_insert_block_size: 2097152
max_memory_usage: 130000000000
max_bytes_before_external_group_by: 100000000000
max_bytes_before_external_sort: 50000000000
max_threads: 128
max_insert_threads: 64
max_query_size: 524288
async_insert: 1
async_insert_threads: 64

Expected behaviour

{{ config(
enabled = true,
schema = 'dimension',
tags = ["dimension"],
materialized = "table",
engine = "MergeTree()",
order_by = ("twitter_pinned_tweet_id"),
) }}

select
pinned_tweet_id as twitter_pinned_tweet_id
,pinned_tweet_text as twitter_pinned_tweet_text
-- ,vector(pinned_tweet_text) as twitter_pinned_tweet_text_vector
,pinned_tweet_created_at as twitter_pinned_tweet_created_at
,date_diff('day',pinned_tweet_created_at, now()) as twitter_pinned_tweet_day_old
,now() as updated_dttm
from
{{ref("raw_twitter_pinned_tweets")}}
order by
updated_dttm desc
limit 1 by
pinned_tweet_id

Code examples, such as models or profile settings

dbt and/or ClickHouse server logs

[0m12:50:07.068025 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "*****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} */

drop table if exists dimension.dim_twitter_pinned_tweet__dbt_backup 


ON CLUSTER "{cluster}" ...

�[0m12:50:07.254117 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.19 seconds
�[0m12:50:07.338073 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "*****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} */

    create table dimension.dim_twitter_pinned_tweet__dbt_backup

engine = MergeTree()
order by (twitter_pinned_tweet_id)

    empty
as (

select
pinned_tweet_id as twitter_pinned_tweet_id
,pinned_tweet_text as twitter_pinned_tweet_text
-- ,vector(pinned_tweet_text) as twitter_pinned_tweet_text_vector
,pinned_tweet_created_at as twitter_pinned_tweet_created_at
,date_diff('day',pinned_tweet_created_at, now()) as twitter_pinned_tweet_day_old
,now() as updated_dttm
from
raw.raw_twitter_pinned_tweets
order by
updated_dttm desc
limit 1 by
pinned_tweet_id
)
...
�[0m12:50:07.429639 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.09 seconds
�[0m12:50:07.460572 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "*****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} */

select name, type from system.columns where table = 'dim_twitter_pinned_tweet__dbt_backup'

  and database = 'dimension'

order by position

...
�[0m12:50:07.529162 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.07 seconds
�[0m12:50:07.549063 [debug] [Thread-1 ]: Writing runtime sql for node "model.clickhouse.dim_twitter_pinned_tweet"
�[0m12:50:07.550210 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "*****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} */

    insert into dimension.dim_twitter_pinned_tweet__dbt_backup ("twitter_pinned_tweet_id", "twitter_pinned_tweet_text", "twitter_pinned_tweet_created_at", "twitter_pinned_tweet_day_old", "updated_dttm")

select
pinned_tweet_id as twitter_pinned_tweet_id
,pinned_tweet_text as twitter_pinned_tweet_text
-- ,vector(pinned_tweet_text) as twitter_pinned_tweet_text_vector
,pinned_tweet_created_at as twitter_pinned_tweet_created_at
,date_diff('day',pinned_tweet_created_at, now()) as twitter_pinned_tweet_day_old
,now() as updated_dttm
from
raw.raw_twitter_pinned_tweets
order by
updated_dttm desc
limit 1 by
pinned_tweet_id
...
�[0m12:50:07.686856 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.14 seconds
�[0m12:50:07.706430 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "*****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} */
EXCHANGE TABLES dimension.dim_twitter_pinned_tweet__dbt_backup AND dimension.dim_twitter_pinned_tweet

ON CLUSTER "{cluster}" 

...
�[0m12:50:07.902549 [debug] [Thread-1 ]: dbt_clickhouse adapter: Error running SQL: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "*****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} */
EXCHANGE TABLES dimension.dim_twitter_pinned_tweet__dbt_backup AND dimension.dim_twitter_pinned_tweet

ON CLUSTER "{cluster}" 

�[0m12:50:07.903325 [debug] [Thread-1 ]: Timing info for model.clickhouse.dim_twitter_pinned_tweet (execute): 2023-11-08 12:50:06.967194 => 2023-11-08 12:50:07.903201
�[0m12:50:07.907147 [debug] [Thread-1 ]: Database Error in model dim_twitter_pinned_tweet (models/dimension/twitter/dim_twitter_pinned_tweet.sql)
Code: 60.
DB::Exception: There was an error on [10.10.1.217:9000]: Code: 60. DB::Exception: Table dimension.dim_twitter_pinned_tweet__dbt_backup doesn't exist. (UNKNOWN_TABLE) (version 23.10.1.1976 (official build)). Stack trace:

  1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000cdd11b7 in /usr/bin/clickhouse
  2. DB::DDLQueryStatusSource::generate() @ 0x0000000012bd0e75 in /usr/bin/clickhouse
  3. DB::ISource::tryGenerate() @ 0x0000000013afadf5 in /usr/bin/clickhouse
  4. DB::ISource::work() @ 0x0000000013afa927 in /usr/bin/clickhouse
  5. DB::ExecutionThreadContext::executeTask() @ 0x0000000013b123ba in /usr/bin/clickhouse
  6. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic*) @ 0x0000000013b09130 in /usr/bin/clickhouse
  7. DB::PipelineExecutor::execute(unsigned long, bool) @ 0x0000000013b083c0 in /usr/bin/clickhouse
  8. void std::__function::__policy_invoker<void ()>::__call_impl<std::function::*****_alloc_func<ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x0000000013b15d8f in /usr/bin/clickhouse
  9. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::*****_deletestd::__thread_struct>, void ThreadPoolImplstd::thread::scheduleImpl(std::function<void ()>, Priority, std::optional, bool)::'lambda0'()>>(void*) @ 0x000000000cebc6a7 in /usr/bin/clickhouse
  10. ? @ 0x00007f32a049f802 in ?
  11. ? @ 0x00007f32a043f450 in ?
    compiled Code at target/run/clickhouse/models/dimension/twitter/dim_twitter_pinned_tweet.sql

Configuration

Environment

  • dbt version: 1.4.9
  • dbt-clickhouse version: 1.4.9
  • clickhouse-driver version (if using native)
  • clickhouse-connect version (if using http):
  • Python version:
  • Operating system:

ClickHouse server

  • ClickHouse Server version: 23.10.1.1976
  • ClickHouse Server non-default settings, if any:
  • CREATE TABLE statements for tables involved:
  • Sample data for these tables, use clickhouse-obfuscator if necessary
@ikeniborn ikeniborn added the bug Something isn't working label Nov 8, 2023
@genzgd
Copy link
Contributor

genzgd commented Nov 8, 2023

@gfunc @zli06160 -- any possibility one of you could take a look at this? Thanks!

@gfunc
Copy link
Contributor

gfunc commented Nov 9, 2023

I will take a look.
I think it is a problem with the exchange macro. It should not give theon cluster clause for table materialization.

Hi @ikeniborn, are you expecting the table to be created on cluster or not? If you are trying to use a clickhouse cluster, you should use distributed_table materialization.

@gfunc
Copy link
Contributor

gfunc commented Nov 9, 2023

I can reproduce the problem now. It seems to be a compatibility issue.

model dimension.dim_twitter_pinned_tweet already exists on the cluster. But with the latest dbt-clickhouse ver 1.4.9 during the creation of the __dbt_backup table, there is no on cluster clause.

I suggest a full-refresh with distributed_table materialization if possible, meanwhile, I will prepare a fix.

@gfunc
Copy link
Contributor

gfunc commented Nov 10, 2023

In #206 my solution to this problem is that we provide a detailed message to reflect the error and make full-refresh workable in this situation.

I am not sure this is the best way. so any comments, suggestions are welcomed.

@ikeniborn
Copy link
Author

I will take a look. I think it is a problem with the exchange macro. It should not give theon cluster clause for table materialization.

Hi @ikeniborn, are you expecting the table to be created on cluster or not? If you are trying to use a clickhouse cluster, you should use distributed_table materialization.

Hi, @gfunc I reinstall dbt-clickhouse on 1.4.8 where all work without error. I have 1 shard and 2 replicas on cluster. I dont want used now distributed table. I want only create table on cluster. When claster will have more 1 shard i refactoring all models to dictributed_table
i wait fix this error. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants